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Message from 
the General 
Cha i r 



Message from 
the Technical 
Cha i r 


Within the past few years, we have 
seen the development and application 
of expert systems at every NASA cen- 
ter. Today, we have operative expert 
systems at al 1 the NASA centers and 
active expert system development at 
many organizations within each 
center. The rapid deployment and 
development of expert systems was the 
direct result of the availability of 
hardware and software development 
support system environments. As ded- 
icated artificial neural systems and 
fuzzy logic hardware with supporting 
software become available, applica- 
tion and development systems will 
proliferate as fast as expert system 
technology. 

The major purpose of this workshop is 
to provide researchers and practi- 
tioners with an opportunity to 
exchange information in order to 
determine the requirements and capa- 
bilities of technologies necessary to 
launch these innovative methodologies 
into mature and productive operative 
envi ronments. 


As you read the program, you wi 1 1 
note that the list of invited 
speakers is like a "who's who" of the 
worlds of neural networks and fuzzy 
logic. We have assembled here an 
extraordinary group of researchers. I 
extend to them my deepest apprecia- 
tion for their willingness to be an 
integral part of this unique event. 

It is a rare opportunity to be at the 
genesis of an entirely new field of 
technology. Although the under- 
pinings of this technology have deep 
roots in the past, we are, at this 
moment, witnessing the assembling of 
the critical mass of experimental and 
theoretical endeavors to bring the 
fruits of research in neural networks 
and fuzzy logic into the application 
domain. I am sure that those of us 
part i c i pat i ng in this workshop will 
remember it as a significant event in 
the evolution of our fledgling disci- 
pline. I believe that fuzzy logic is 
vital to our future control systems, 
and that neural networks are the key 
to many of the crucial problems that 
have eluded the present technology. 
With our combined efforts, we are 
taking an important step in the quest 
for the essence of artificial 
intel 1 igence. 


Robert H. Brown 


Robert T. Savely 
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MESSAGE FROM THE TECHNICAL AREA DEVELOPERS 



Neura 1 
Networks 



From a research point of view, arti- 
ficial neural systems offer many 
hopes toward synthetically patterning 
the complex behaviors found in na- 
ture. From an applications point of 
view, artificial neural systems offer 
hope for a more efficient method of 
developing control systems, which 
conventionally are difficult to 
model. Artificial neural systems are 
particularly exciting because it is a 
science which enforces the collabora- 
tion of the biological sciences, 
physical sciences, cognitive sci- 
ences, and engineering sciences. 
This collaboration between sciences 
is not found elsewhere, and is the 
basis for a true whole science. 

Fuzzy sets, although different from 
artificial neural systems, also are 
adapted toward modeling complex sys- 
tems which were difficult or impos- 
sible with conventional methods. The 
combination of these fields shows 
great promise toward capturing that 
elusive problem of modeling nature's 
complex behaviors. 

I would like to extend my sincere 
appreciation and gratitude to the 
committee members, the invited 
speakers, and the participants for 
making this workshop productive and 
interesting. I would also like to 
wish each of you much prosperity and 
good fortune in the pursuit of these 
exciting fields. 


This workshop offers a unique oppor- 
tunity for scientists and engineers, 
who have common interests in fuzzy 
sets and/or neural network tech- 
nology, to meet and discuss problems 
of mutual concern. The workshop will 
feature presentations of current work 
by noted experts in the above 
mentioned fields. The areas of dis- 
cussion will vary widely but will all 
have commonality in dealing with 
decisionmaking in environments where 
knowledge is imprecise, vague, incom- 
plete and exact models are impossible 
or at least impractical to build. 

The fields of fuzzy sets and neural 
networks, although considerably dif- 
ferent in approach to problem 
solutions, seem to have some common 
applicability areas, or, for example, 
in visual and voice pattern recogni- 
tion. It is hoped that at this work- 
shop much valuable information will 
be exchanged and some problems of 
mutual interest in both fuzzy sets 
and neural networks will surface. 

We are indeed grateful to the prom- 
inent scientists who have agreed to 
provide their time and share their 
ideas with us. Without them, this 
workshop could not be successful. 
However, we also wish to acknowledge 
the fact that a successful workshop 
atmosphere depends on the communica- 
tion and interaction of all 
attendees. We sincerely appreciate 
everyone's participation and support. 


James Vi 1 1 arrea I 


Robert N. Lea 



Neural Network/Fuzzy Logic Program Overview 


Monday, May 2 


7:30 

8:00-8:30 

8:30-9:30 

9:30-10:30 

10:30-11:30 

SESSION 1 
1:00-1:30 

1:30-2:00 

2:00-2:30 

2:30-3:00 

SESSION 2 
3:15-3:45 

3:45-4:15 

4:15-4:45 


Registration 

Welcome and Introduction 

OPENING ADDRESS SPEAKERS: 

Stephen Grossberg - Boston University 

Emergent Invariants of Self-organizing Neural Networks for 

Pattern Recognition and Robotics 

Bart Kosko - University of Southern California 
Fuzzy Theory and Neural Networks 

Lotfi Zadeh - University of California 

The Role of Fuzzy Sets in the Treatment of Uncertainty in 

Control Processes and Knowledge Representation 


Takeshi Yamakawa - Kumamato University 
A Fuzzy Microprocessor: A Novel Device for High-Speed 
Approximate Reasoning 

Masaki Togai - Togai Infralogic, Inc. 

Fuzzy and Neural Net Processor and its Programming 
Environment 

Hiroyuki Watanabe - University of North Carolina 
Fuzzy Logic Inference Processor: Custom VLSI Design for 
System Integration 

Kaoru Hirota - Hosei University 

An Application of Fuzzy Logic to Robotic Vision and Control 


Demetri Psaltis 
Optical Neural Computers 

Harold Szu - Naval Research Laboratory 

What is the Significance of Neural Networks for AI? 

Daniel Levine - University of Texas at Arlington 
Neural Modeling of Selective Attention 


KEYNOTE DINNER 


7:00-8:00 COCKTAIL RECEPTION 

8:00-9:30 DINNER - DR. LEON COOPER - KEYNOTE SPEAKER 


Tuesday, May 3 

SESSION 3 

8:30-9:00 James Bower - California Institute of Technology 

Applied and Real Neural Networks : A Coordinated and 
Interdependent Investigation of Both 

9:00-9:30 Walter Freeman - University of California, Berkeley 

Implementation of Pattern Recognition Algorithms Derived 
from Olfactory Information Processing 

9:30-10:00 Guenter Gross - North Texas State University 

Multi-electrode Burst Pattern Feature Extraction from 
Mammalian Networks in Culture 

10:00-10:30 Mike Myers - TRW ANS Center 

A Hybrid Connectionist - AI Architecture for Reflective and 
Exploratory Systems 

SESSION 4 

10:45-1 1:15 William Siler - Mote Marine Laboratory 
Applications of a Fuzzy Expert System 

11:15-11:45 Maria Zemankova - University of Tennessee 

Intelligent Information Systems with Learning Capabilities 

SESSION 5 

1:15-1 :45 Pentti Kanerva - RIACS, NASA Ames Research Center 

Understanding Information - processing in Animals as a Way 
to Building Intelligent Robots 

1:45-2:15 Claude Cruz - Plexus Systems 

Knowledge Processing Using Neural Networks 

2:15-2:45 Rod Taber - General Dynamics 

Fuzzy Logic Operators and Neuron Activation Fields 


2:45-3:15 


Douglas Reilly - Nestor, Inc. 

Adaptive Pattern Recognition Using a Multi-Neural Network 
Learning System 


3:30-4:00 James Bezdek - Boeing 

Knowledge Representation by Linguistic Transitive Closures of 
Trapezoidal Fuzzy Members 

4:00-4:30 Bill Buckles - Tulane University 

Relationship Between Uncertainty and Databases and Expert 
Systems 

4:30-5:00 James Buckley - University of Birmingham 
Linear Fuzzy Controller 


* Busing will be provided for lunch hours only and will depart from the 
Gilruth Recreation Center, arriving at Building 1 1 cafeteria for lunch. 
Buses will pick those eating at cafeteria up and return them to the Gilruth 
at specified times. 

Monday - 1 1 :45 am departure 

12:45 pm arrival 

Tuesday - 12:00 noon departure 

1 :00 pm arrival 


Neural Networks/Fuzzy Logic Committee Members 


General Chair 

Robert H. Brown, NASA/JSC 

Technical Chair 

Robert T. Savely, NASA/JSC 


Technical Area Developers 

Artificial Neural Systems 
James A. Villarreal, NASA/JSC 

Fuzzy Logic 

Robert N. Lea, NASA/JSC 


Executive Chair 
Sandy Griffin, NASA/JSC 

Assistant Executive Chair 
Carla Armstrong, NASA/Barrios 

Administrative Chair 

Carol Kasworm, U of H/Clear Lake 

Local Publicity Chair 

Daniel C. Bochsler, NASA/Lincom 
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Keynote Speaker 
Leon Cooper, Ph.D. 



Dr. Cooper is cofounder and chairman of the board of Nestor, Inc. He 
received his A.B. in 1951* his A.M. in 1953, and a Ph.D. in 1 9 5 A , all from 
Columbia University. He has been a professor, an associate professor, or a 
visiting professor at various universities and summer schools. He has also 
served as a consultant for various governmental agencies, and industrial and 
educational organizations. Dr. Cooper has given various public lectures and 
presented papers at international conferences and symposia. Throughout his 
career, he has been the recipient of the following fellowships and awards: 

Nobel Prize (with J. Bardeen and J. R. Schrieffer), 1972 

Award of Excellence, Graduate Faculties Alumni of Columbia University, 1974 
Descartes Medal, Academie de Paris, Universite Rene Descartes, 1977 
John Jay Award, Columbia College, 1985 

Who's Who, Who's Who in America, Who's Who in the World, various other 
1 i st i ngs 

Comstock Prize (with J. R. Schrieffer), National Academy of Sciences, 1968 
NSF Postdoctoral Fellow, 1954-55 

Alfred P. Sloan Foundation Research Fellow, 1959-66 

John Simon Guggenheim Memorial Foundation Fellow, 1965-66 

Fellow, American Physical Society, American Academy of Arts and Sciences 

Member, American Philosophical Society; National Academy of Sciences; 

Sponsor Federation of American Scientists; Society for Neuroscience; 
American Association for Advancement of Science; Institute for Advanced 
Study, 1954-55; Counseil Superieur de la Recherche de 1 'Universite Rene 
Descartes (Academie de Paris, Paris V); Phi Beta Kappa; Sigma Xi 

Doctor of Sciences (honoris causa), Columbia University, 1973; University 
of Sussex, 1973; University of Illinois, 1974; Brown University, 1974; 
Gustavus Adolphus College, 1975; Ohio State University, 1976; Universite 
Pierre et Marie Curie, Paris, 1977 
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Stephen Grossberg, Ph.D. 

Boston University 
Boston, Massachusetts 


Dr. Grossberg received his graduate training at Stanford University and 
Rockefeller University, and was a professor at the Massachusetts Institute 
of Technology before assuming his present position at Boston University. He 
is professor of mathematics, psychology, and biomedical engineering at 
Boston University, where he founded and is the director of the Center for 
Adaptive Systems. He is also the director of the university's new graduate 
program in cognitive and neural systems. In addition, Dr. Grossberg is 
president of the International Neural Network Society and coeditor-in-chief 
of the society's journal, Neural Networks. During the past few decades, Dr. 
Grossberg and his colleagues at the Center for Adaptive Systems have 
pioneered and developed a number of the fundamental principles, mechanisms, 
and architectures that form the foundation for contemporary neural network 
research. 


EMERGENT INVARIANTS OF SELF-ORGANIZING 
NEURAL NETWORKS FOR PATTERN RECOGNITION AND ROBOTICS 

Abstract 

Described are several real-time neural network architectures that are 
capable of self-organizing invariant behavioral properties in applications 
to sensory pattern recognition, cognitive information processing, and 
adaptive sensory-motor control. These invariants include a similarity 
invariant that arises in adaptive pattern recognition and cognitive 
information processing; a position invariant that arises in determining the 
location of a target with respect to the head; and a synchrony invariant 
that enables motor systems with multiple degrees of freedom, such as arms 
and speech articulators, to generate flexible and synergetic planned 
movements. 



EMERGENT INVARIANTS OF SELF-ORGANIZING NEURAL NETWORKS 
FOR PATTERN RECOGNITION AND ROBOTICS 

Stephen Grossberg 


A lecture delivered at 

The 1988 First Joint Technology Workshop 
on Neural Networks and Fuzzy Logic 

May 2, 1988 

Lyndon B. Johnson Space Center 
Houston, Texas 
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FUZZY THEORY AND NEURAL NETWORKS 
Abstract 

Does fuzziness differ from probability? How does fuzzy theory relate to 
neural networks? These questions are addressed. There are many other 
connections between neural networks and fuzzy theory. Besides fuzzy entropy 
minimizers, fuzzy associative memories (FAM's) map fuzzy subsets to fuzzy 
subsets. Simple FAM's can be constructed using a fuzzy Hebb law and 
max. /min. composition instead of vector-matrix multiplication. Another 
example is fuzzy causal networks, or fuzzy-cognitive-maps, feedback- 
knowledge networks that admit degrees of causality and perform forward- 
chaining inference without graph search. 
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DISPOSITIONAL LOGIC AND COMMONSENSE REASONING 

Abstract 

Dispositional logic (DL) is a branch of fuzzy logic which is concerned with 
inference from dispositions, or propositions, which are preponderantly, but 
not necessarily, true. Simple examples of dispositions are: birds can fly, 

snow is white, and Swedes are blonde. The importance of the concept of a 
disposition derives from the fact that much of -ommonsense knowledge may be 
viewed as a collection of dispositions. Dispositional logic provides an 
alternative approach to the theories of def au 1 t' reason i ng , nonmonotonic 
reasoning, circumscription, and other widely-used approaches to commonsense 
reasoning. The premises in DL are assumed to be of the form usually (X is 
A) or usually (Y is B if X is A), where A and B are fuzzy predicates which 
play the role of elastic constraints on the variables X and Y. Inference 
from such premises reduces, in general, to the solution of a nonlinear 
program. In many cases, an inference rule in DL has the form of a fuzzy 
syllogism. The importance of dispositional logic transcends its function as 
a basis for formalization of commonsense reasoning. Viewed in a broader 
perspective, it underlies the remarkable human ability to make rational 
decisions in an environment of uncertainty and imprecision. 
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A FUZZY MICROPROCESSOR - A NOVEL 
DEVICE FOR HIGH-SPEED APPROXIMATE REASONING 

Abstract 

A fuzzy controller hardware device demonstrated in the second IFSA Congress 
has distinctive features: (1) high speed (1 000 000 FIPS); (2) easy 
programming; (3) suitability for nonlinear and/or time-variant systems; and 
(4) robustness against the noise, temperature change, power supply 
fluctuation, and defect of transistors. The hardware also has a slight 
mi sprogrammi ng . The rule board and the defuzzifier board are reduced to 
small chips. They are a rule chip and a defuzzifier chip. By employing 
these two types of chips, a sophisticated fuzzy controller hardware system 
can be easily implemented. 
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FUZZY AND NEURAL NET PROCESSOR AND ITS PROGRAMMING ENVIRONMENT 

Abstract 

The fuzzy logic inference processor (FLIP) is a slave processor designed to 
speed rule evaluation in high-speed, real-time oriented expert systems. It 
interfaces easily as a slave processor to standard microprocessors and 
microcontrollers, and is capable of operating without intervention from the 
host system. The FLIP device is capable of inferencing using two distinct 
paradigms: fuzzy and neural. The fuzzy paradigm grades the observation 

values as to their degree of support of the premise, then weighs and merges 
conclusions based upon the degree of support each premise receives. The 
neural paradigm weighs each of the inputs, sums all of the weighted inputs, 
then applies a transfer function to derive the output. Any combination of 
these paradigms may be included in a knowledge base. The software system to 
support the development of fuzzy logic system or neural net descriptions for 
the FLIP is also under development. This user friendly software interfaces 
FLIP for evaluation of fuzzy and neural systems, allowing considerable 
flexibility in developing rules and rule evaluations with capacity for trace 
and truth maintenance. Use of symbolic representation and "human 
definitions" greatly simplifies the job of knowledge acquisition. 
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ARTIFICIAL INTELLIGENCE ON A CHIP 

SOFTWARE ENVIRONMENT 

Software developed in ANSI Standard C 

Graphical interface developed in Microsoft Windows ™ 

• Uniform graphical interface 

• Screen-cut & text/graphics for documentation 

• DOS executive provided 

Graphical environment provides ease of knowledge acquisition 

• Schematic representation of networks 
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TIL HOST SYSTEM CONFIGURATION 

Host System : Togai InfraLogic 386 PC 

Minimum Configuration: 

Processor: 16MHz 386 
Memory: 2 Mb 
Monitor: EGA color 

Disk Memory: 20Mb hard disk or greater 
1.2 Mb floppy disk drive 
360 Kb floppy disk drive 
Disk Operating System: DOS 3.2 or higher 
Slot Configuration: 6 AT slots 

Additional Hardware; 

Togai InfraLogic Net-Processor Board 
Togai InfraLogic General Purpose I/O Board 
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FUZZY NET 


Real-Time Inferencing 
Knowledge Acquisition Support Software 
Flexibility of Connection & Membership Graphs 
Eight Bit Computational & I/O Resolution 
Trace Back & Storage 
Reduction in Chip Level I/O 
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ARTIFICIAL INTELLIGENCE ON A CHIP 

TIL NEURAL PARADIGM 
Real-Time Processing 
Flexibility of Connection 
Eight Bit Computational & I/O Resolution 
Weighting Accuracy 1% 

All Nodes Resident & Visible 
Reduction in Chip Level I/O 

s 

10 3 X Connectivity of Analog Solutions 
Stable Across Voltage and Temperature 
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NOTABLE SYSTEM PERFORMANCE 


CHIP 

• Single Chip Net Processor 

• Scalable S1MD Architecture 

• Cascadable to MSIMD Architecture 

• 1 0-20 MHz Clock Rate 

FUZZY APPROXIMATE REASONING 

• Processes up to 128 Production Rules Simultaneously 

• Up to 256 Inputs and Outputs per Production Rule 

• Greater than 20K FL&Ps 

• Greater than 200K Production Rule Evaluations per Second 
NEURAL PROCESSING 

• Processes up to 16 Neurons Simultaneously 

• Greater than 65,000 Inputs per Neuron 

• Transfer Function User Definable 

• Greater than 2M Connections per Second 
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FUZZY LOGIC INFERENCE PROCESSOR - 
A CUSTOM VLSI DESIGN FOR SYSTEM INTEGRATION 

Abstract 

The VLSI implementation of a fuzzy logic inference mechanism allows the use 
of rule-based control and decisionmaking in demanding real-time applications 
such as robot control, and the area of command and control. The full custom 
CMOS VLSI is described. The chip is second generation of such design and 
has several design features which make its use realistic. These features 
include reconf igurable architecture, on-chip fuzzification and defuzzifi- 
cation, and memory and data-path redundancy for higher yield. The chip 
consists of 6 1 4 000 transistors, of which 460 000 are used for random access 
memory. For the fuzzy inference chip to be useful, we must package it into 
a system integrating hardware and software. We need to provide a user- 
friendly interface for control engineers. We are developing a system that 
combines graphic text inputs in a multiple-window environment. For rule set 
programming, a multiple-window environment provides editing and display 
facilities for the fuzzy rule sets, for fuzzy variables, and for the fuzzy 
set membership functions. Separate text and graphic windows interact with 
the user and display the developing system in various modes from different 
levels of abstraction. Simulation of the rule execution also can be 
displayed in graphic form. 
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abstract 

The VLSI implementation of a fuzzy logic inference mechanism allows the use of rule-based 
control and decision making in demanding real-time applications such as robot control and 
in the area of command and control. The full custom CMOS VLSI is described. The chip 
is second generation of the design. It has several design features which make the use of this 
chip realistic. These features include reconfigurable architecture, on-chip fuzzification and de- 
fuzzification, and memory and data-path redundancy. The chip consists of 614,000 transistors 
of which 460,000 are used for RAM memory. 


1 Introduction 

Fuzzy logic based control uses a rule-based expert system paradigm in the area of real-time 
process control [4]. It has been used successfully in numerous areas including chemical process 
control, train control [12] cement kiln control [2], and control of small aircraft [5]. In order 
to use this paradigm of a fuzzy rule-based controller in demanding real-time applications, the 


1 


VLSI implementation of the inference mechanism has been an active research topic [9,10,11]. 
Potential applications of such a VLSI inference processor includes real-time decision-making 
in the area of command and control [3], control of the precision machinery [l], and robotic 
systems [6]. 

We have been designing a second-generation VLSI fuzzy logic inference engine on a chip. 
The new architecture of the inference processor has the following important improvement com- 
pared to previous work: 

1. programmable rule set memory 

2. on-chip fuzzifying operation - table lookup 

3. on-chip defuzzifying operation - center of area algorithm 

4. reconfigurable architecture 

5. RAM redundancy for higher yield 

The original prototype experimental chip (designed at AT&T Bell Labs) had minimal logic 
on chip. For example, it used ROM for the rule set memory which reduced its utility [10]. 
We are now designing a more realistic chip which has RAM for the rule set memory so that 
rules can be programmable. In addition to the fuzzy inference mechanism, the fuzzifying and 
defuzzifying operations are performed on chip. The new design has a reconfigurable architecture 
such that we can have either 51 rules, 4 inputs and 2 outputs, or 102 rules, 2 inputs and 1 
output. These new design decisions render the new architecture realistic. 

2 Fuzzy Set and Fuzzy Logic 

Fuzzy set is based on a generalization of the concept of the ordinary set. In an ordinary set, 
we associate a characteristic function for each set. For example, we can define a set S with its 
characteristic function f 3 — ► {0, 1}. Then, for all e in the universal set U, 

e e S if f,(e) = 1, 
e & S if /,(e) = 0. 

Each element of the universe either belongs to or does not belong to the set S. In a fuzzy set, 
an element can be a member of the set with varying degree of membership. The associated 
characteristic function, therefore, returns any real number between 0 and 1, and it is termed 
as the membership function. For a fuzzy set F, we have an associated membership function 
/iZ’(e) — *• [0, 1]. For example, if element e is a member of fuzzy set F with degree 0.34, the 
associated membership function returns this value, Pf(c) — 0.34. If Pf(^) = 0, e is entirely 
outside of fuzzy set F , and if Pf(^) = 1, e is entirely inside of fuzzy set F. Fuzzy set is 
represented by a set of ordered pairs of an element u,- and its grade of membership: 

F = {(u,-,/xf( u .'))}> u i G U 

where U is a universe of discourse. Using a fuzzy set, we can represent and manipulate imprecise 
and vague concepts and data. For example, approximately 100 km/h is represented by the fuzzy 
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Figure 1: Approximately 100 km/h. 

set whose membership function is shown in Figure 1. We can extend classical set theory by 
defining basic set theoretic operations over fuzzy sets. The following definition of intersection 
and union with fuzzy sets are suggested by Zadeh [13]. The set theoretic operations with fuzzy 
sets are defined via their membership functions. Let A and B be a fuzzy set, then union, 
intersection and complement of the fuzzy sets are defined as follows. The membership function 
of the intersection C = A n B is defined by 

Hc{e) = min(p A {e), p B {.e)), e 6 U. 

The membership function of the union D = A U B is defined by 

^ D (e) = max(n A (e),n B (e)), e G U. 

The membership function of the complement ->A of A is defined by 

M-A(e) = 1 - AM(e), e 6 U. 

In the traditional logic, one of the most important inference rules is modus ponens , that is 


Premise 

A is true 

Implication 

If A then B 

Conclusion 

B is true 


Here, A and B are crisply defined propositions. We can construct a fuzzy proposition using a 
fuzzy set such as: 

Current speed is approximately 100 km/h. 

By introducing fuzzy propositions into modus ponens, we can generalize modus ponens. Let 
C, C', D , D' be fuzzy sets. Then the generalized modus ponens states: 


Premise 

x is C' 

Implication 

If x is C then y is D 

Conclusion 

y is D' 


We can use different premises to arrive at different conclusions using the same implication. For 
example, 
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Figure 2: Inference. 


Premise 

Implication 

Visibility is slightly low 

If visibility is low then condition is poor 

Conclusion 

Condition is slightly poor 


or 


Premise 

Implication 

Visibility is very low 

If visibility is low then condition is poor 

Conclusion 

Condition is very poor 


The above inference is based on the compositional rule of inference for approximate reasoning 
proposed by Zadeh [14]. Suppose we have two rules with two fuzzy clauses in the IF-part and 
one clause in the THEN-part: 

Rule 1: If (x is A\ ) and (y is Bi ) then (z is C\ ), 

Rule 2: If (x is A 2 ) and (y is f? 2 ) then (z is C 2 ). 

We can combine the inference of the multiple rules by assuming the rules are connected by 
OR connective, that is Rule 1 OR Rule 2 [10]. 

Given fuzzy proposition (x is A') and (y is B 1 ), weights af and af of clauses of premises 
are calculated by : 

af = max(^-nA(e)), e € X 

A 

of = max(/iB'nfl,(e)), e 6 Y for i= 1,2. 

Then, weights w\ and tn 2 of the premises are calculated by : 

w\ = min(a^,af), 
w 2 =- min(a 2 , af ), 
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Weight ad represents the closeness of proposition (x is A,) and proposition (x is A'). Weight 
tx>, represents similar measure for the entire premise for the i th rule. The conclusion of the first 
rule is 

C[ = min(ioi,Ci), 

The conclusion of the second rule is 


Cj = min(u>2,C 2 ), 


The overall conclusion C’ is obtained by 


C' = max(C(,C£). 

This inference process is shown in Figure 2. In this example, aj 4 = 0.5 and af = 0.25, therefore 
w\ = 0.25. af = 0.85 and af = 0.5, therefore u >2 = 0.5. 

3 Rule-based Controller 

The usual approach for automatic process control is to establish a mathematical model of the 
process. However, this is not always feasible. In some cases, there is no proper mathematical 
model because the process is too complex or ill-understood. In other cases, experimenting 
with plants for construction of mathematical models is too expensive. In still other cases, the 
mathematical models are too complicated or computationally expensive and are not suitable 
for real time use. For such processes, however, skilled human controllers may be able to operate 
the plant satisfactorily. The operators are quite often able to express their operating practice 
in the form of rules which may be used in a rule-based controller. The rule based controllers 
model the behavior of the expert human operator instead of the process. The following is a 
rule from an aircraft flight controller [5]. This rule takes three inputs and has two outputs. 

If (1) The rate of descent is Positively Medium, 

(2) The airspeed is Negatively Big (compared to the desired airspeed), 

(3) The glide slope is Positively Big (compared to the desired slope). 

Then (1) change engine speed by Positively Big , and 

(2) change elevator angle by Insignificant Change. 

The expressions, Positively Medium, Positively Big, Insignificant Change, and others represent 
imprecise amounts. They represent intuitive feel of the expert human controller. They cor- 
respond to the imprecise expressions used by the expert for communicating a rule of thumb. 
They are represented by using fuzzy sets and their associated membership functions. 

The fuzzy set, such as Positively Medium is represented by the membership function over an 
appropriate universe of discourse such as revolutions per minute (rpm). The possible definitions 
of fuzzy sets are shown in Figure 3. The control rules are encoded using typically 10 to 70 
rules. The Control is performed based on the fuzzy inference mechanism described in Section 
2 and Figure 2. In controlling a process, all of the rules are compared to the current inputs 
(observations) and fired. The actions (THEN-part) of each rules are weighted by how close its 
IF-part matches the current observation. In the example of Figure 2, a rule has two inputs and 
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Figure 3: Typical fuzzy sets. 

a single output. The weights are represented by w\ and w 2 . The results of firing of each rule 
are then combined by superimposing them. The final result which is supplied to a controller 
should be a crisp number rather than a fuzzy set, therefore we need to perform a defuzzifying 
operation. This is computed by taking a center of area under the fuzzy membership function of 
the final result. Even though each individual rule is an incomplete rule of thumb, the results of 
firing each rule are properly weighted and combined and the final result represents reasonable 
compromise. 

In order for VLSI implementation of fuzzy inference to be useful, a fair amount of pre- 
processing (fuzzifying) and post-processing (defuzzifying) must be performed on chip. The 
AT&T prototype chip assumed that both of these processes are performed by the host- 
processor. However, the inference processing is too fast for fuzzifying and defuzzifying to 
take place off-chip by a hast processor. This assumption burdened the host processor and 
nullified the advantage of VLSI implementation of the inference mechanism. 

4 Chip Architecture and Implementation 

The process controller system is configured as in Figure 4. The VLSI implementation is done 
with four components; a fuzzyer, a rule memory, an inference mechanism, and a defuzzifier 
on a single chip. Each input and output data item is 6 bits. This fits well with available 
A/D and D/A converters. In addition, our chip will communicate with a host processor. The 
chip has three stage pipelining architecture. The pipeline consists of IF-part, THEN-part, and 
defuzzifier. 

We considered the size of the fuzzy set and the grade of fuzziness for practical use. In most 
cases, a fuzzy variable has three to sixteen elements and the grade of fuzziness has three to 
twelve levels [5,8]. In this chip implementation, the universe of discourse of a fuzzy set is a finite 
set with 64 elements (i.e. 6 bits). The membership function has 16 levels (i.e. 4 bits). That 
is, 0 represents no membership, 15 represent full membership, and other numbers represent 
points in the unit interval [0, 1]. A fuzzy membership function is, therefore, discretized using 
64 numbers of 4 bit; that is 256 bits of memory storage. The representation of a fuzzy set is 
as follows: 


Wo 

Ui 


Ui 


W63 

0000 

0011 


Mf(w,) ... 

0000 
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User 



Figure 4: Fuzzy logic controller. 
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Fuzzifying is done using a table look-up. For each observation (i.e. input stream), we store 
a table of the membership function normalized at the center of the horizontal axis. That is, 
the full membership is at the center. According to an input value, the membership function is 
shifted. The chip can produce 64 different membership functions from a single stored pattern. 
The membership function can be associated with a predicted measurement error of a sensor. 
If we do not need fuzziness in the observed value, we can store a pulse function, that is only 
one entry has membership 1 and all the other entries have 0’s. The result of the fuzzifying is 
broadcasted to all of the rules. In the actual chip implementation, the content of the table is 
not shifted. Rather a starting address for table look-up is shifted according to an observation 
input. 

The chip is re-configurable. A control system can take four inputs and produce two out- 
puts or take two inputs and produce one output according to an application. With the first 
configuration, we can have 51 rules on a single chip. Each rule has four clauses in the IF-part 
and two actions in the THEN-part. 

If A and B and C and D 
Then Do E, and 
Do F. 

With the second configuration, we can execute 102 rules using a same data-path. Each rule 
has two clauses in the IF-part and one action in the THEN-part. 

If A and B Then Do E, 

If C and D Then Do F. 

A data-path is assigned for each rule, therefore all of 51 or 102 rules are executed in parallel. 
There are only two basic units; they are a parallel minimum unit and a parallel serial unit. 
The former performs the intersection operation on fuzzy sets, and the latter performs the union 
operation. The configuration of the If-part of the data-path is shown in figure 5. The data-path 
can execute one rule with 4 if-clauses or two rules with 2 if-clauses. Four pairs of min/max units 
compute the weight a’s for each clause. The min elements organized as a binary tree compute 
weights w of the premise which is the minimum of all a’s. In the 51 rule configuration, the last 
two minimum units compute the same weight tu,. In the 102 rule configuration, streams of l’s 
are supplied and these two min elements behave as delay elements. The control of configuration 
is done by setting a bit in the status register from the host computer. Defuzzifying is done by 
computing a center of area (COA) under the final membership function. Denoting the final 
fuzzy subset as A, the COA algorithm computes the following: 

. E”o"'M") 

£ 2,0 

Since each element of the universe is processed serially, we can substitute multiple addition for 
multiplication in the above computation. The data sequence from the THEN-part is produced 
starting from the most significant data point as follows: 

AM( 63), Ha{& 2), ..., AM(1), AM(0). 
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Figure 7: Redundancy 

Two adders and two registers are used as shown in Figure 6. The numerator is computed by the 
first adder and denominator is produced by the second adder. The denominator is computed 
as by repeated addition of the result of the first adder by the second adder which computes 
the following formula. 

63 

nn A {n) = /J^(63) + 

n=0 

3) + fi A (62) + 

M>l(63) + n A ( 62 ) + ha( 61 ) + 

/*>l(63) -f fi A (62) + am(6 1) + • • • + ha( 0). 

In order to achieve higher yield, we allocated 51 data-paths on the chip, and non-functioning 
memory units and data-paths can be isolated from the rest of the chip. The isolation is achieved 
by blowing a fuse using laser technology. Each pair of a memory unit and a data-path can be 
reprogrammed to any other address also by blowing a fuse. This allows a continuous addressing 
of memory/data-paths after removal of a defective unit from a chip. The schematic diagram 
for address removal and re-programming circuit is shown in Figure 7. 

The host processor down loads the rule set and table for fuzzification at start up time. 
The fuzzy processor looks like a static RAM chip to the host processor. The RAM system, 
however, only has a row decoder and does not have a column decoder. A user can address 
each row (corresponds a clause/action of a rule) by a memory address register. Each column is 
addressed by a shift register because data are accessed sequentially. The last address is reserved 
and mapped to the status register. This register controls the configuration of data-paths and 
operational modes (load, run, or test). Fuzzification tables have their own memory address 
and loaded similarly as rule memory. 

The chip is designed for a 1 /xm N-well CMOS process of MCNC [7], It uses non-overlapping 
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Die Size 
No. Transistors 
No. Pins 
Package Type 
Clock Frequency 
Power Supply 
Power (Est.) 
Interface 
Modes 


Redundancy 


7750/i X 9080/i 
614K (470K RAM) 

84 (16 Power/GND) 

PGA (Standard Pad Frame) 
40 MHz @ 70°C 
3.0 -3.3 v 
600mW 

TTL Compatible 
4 inputs/2 outputs/51 rules, 

2 inputs/1 outputs/102 rules, 
test 

Laser Programmable 


Process 

Gate Length/t ox 
Poly/Metal 1/Metal 2 


1 nm N-well CMOS 

1.0/zm/22.5nm 

2.6/2.6/4.0/xm 


Table 1: Summary of circuits 

two phase clocking scheme. The chip is designed with a target operational speed of 40MHz. 
The chip consists from approximately 614,000 transistors of which about 470,000 are used to 
form the static RAM system. The die size is 7750/zm by 9080/im, and is packaged in a standard 
pin grid array with 84 pins. The supply voltage is 3.0-3. 3 v. Table 1 summarizes the process, 
device specifications and primary architectural features. Figure 8 shows the layout map of the 
chip. 

5 System Integration 

For the fuzzy inference chip to be useful we must package it into a system integrating hardware 
and software, hence development of hardware and software must be coordinated. We need to 
provide a user friendly interface to control engineers. We have performed a substantial work in 
development of software system. Hardware side of the system integration is in a preliminary 
design stage. 

5.1 Hardware System 

For the hardware side, we will package the VLSI chip into a single board system. The 
single board system should be bus compatible with widely available personal computers or 
workstations. Potential candidates are: 1) IBM PC/AT, 2) Sun workstation, 3) IBM Personal 
System II, 4) Apple Machlntosh II. At this moment, we believe either IBM PC/AT or Sun 
workstation is most suitable for our purpose. IBM PC/AT is widely available and is used in 
factory automation. On the other hand, we have extensive software on Sun worksation. 
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The single board system consists of a VLSI fuzzy logic inference processor chip, logic for a 
standard bus interface, A/D converters for inputs, D/A converters for output, and glue logic. 
For applications requiring more rules, we can combine multiple fuzzy chips into one inference 
processing system. We would only need a small amount of extra glue logic and chip control. 
Overall the single board system is fairly modest and should be easy to construct. 

5.2 Software system 

For software system integration, we need a programming environment for developing the 
control rules, and software to communicate and drive the fuzzy logic inference board from a 
host processor. We have been developing a system that combines graphic input and text input 
in a windowed environment using X window system. Window environment is useful for editing 
of rule set, and graphic representation of simulation of rule set execution. 

5.3 Programming Environment 

As discussed above, the chip’s output is driven by a set of IF-THEN rules. A rule set should 
be easy to develop, test and load into the chip. Our programming environment allows a user 
easily to describe a rule set which represents operating practice in the system that the chip 
will control. The user must be able to define membership functions and assign them to IF and 
THEN clauses of the rules. Fuzzy variables which will take on input values during chip operation 
must also be assigned membership functions for the fuzzifying process. The environment allows 
easy simulation and testing of the rule set. The simulated execution is displayed graphically 
for ease of debugging and refinement of the rule set. Finally, the rule set will be down-loaded 
to the Fuzzy Logic Board. 

5.3.1 Editors 

For rule set programming, a multiple window environment provides editing and display facilities 
for the fuzzy rule sets, for fuzzy variables, and for the fuzzy set membership functions used 
in both the fuzzifying process and the representation of the rule clauses. Separate text and 
graphic windows interact with the user and display the developing system in various modes 
and from different levels of abstraction. 

Working in the editors, the user may proceed sequentially or select randomly among the 
items to be defined. Automatic sequential entry allows fast initial setup of prototype rule 
systems. Correction and modification require random access. 

For each of the editors, (fuzzy set membership functions, fuzzy variables, and the fuzzy rule 
set), a text window and a graphics window are available and may be displayed simultaneously. 
Editing may proceed by text input to the text window, or by mouse and keyboard input to the 
graphics window. As changes are made in one window, the corresponding changes will appear 
in the other window as appropriate to that mode. 

A fuzzy set membership function is represented internally as 64 discrete numbers, each 
specifying the membership at one point in the universe of discourse. Graphic input of the 
corresponding shape may be made by line segments which are immediately translated into the 
step function of discrete values. Figure 9 shows an actual screen of Sun workstation perfoming 
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Figure 9: Graphic editor 


this task. The function is given a name such as positive medium. Alternatively, a membership 
function may be a prededined shape such as a triangular. 

A library of fuzzy set membership functions gives the programmer the option of using 
predefined terms for the rule set clauses and fuzzifying functions. Without the need for extensive 
initial definition of terms, prototyping can progress quickly to the simulation stage. The system 
may then be fine-tuned through custom redefinition of terms. Predefined fuzzy set membership 
functions may also be associated with application derived terminology without the need for 
customized function shape specification. Additions and deletions may be made in the library. 

A fuzzy variable is the internal representation of some input or output such as airspeed, 
glide slope, or elevator angle. For processing by the fuzzy system, a single value is represented 
by a membership function over a universe of discourse. Thus a fuzzy variable must be associated 
with a membership function which will fuzzify an input value or represent the output of a rule 
for subsequent output value determination. Using the editor, the associated function may be 
layed out in the graphics window or an existing membership function name may be specified in 
the text window. The corresponding graphic shape will then appear in the graphics window. 

The rule editor has a structured text editor. The user fills in the blanks and is prompted 
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Figure 10: Rule editor - text window. 

at the next blank. See Figure 10. The user may move the cursor to any blank and may select 
at random the rule currently being edited. As blanks are filled in, the corresponding graphic 
shapes will appear in the graphics window. The window configurations on Sun workstation are 
shown in Figure 11. 

5.3.2 Simulation 

The development of control rules is experimental in nature; a trial and error approach is 
customary. Simulation of the rule set system is therefore required. This includes offline, 
software simulation of the behavior of the chip as well as interaction with a program simulating 
the process to be controlled. These simulation processes are integrated with the rule editing 
facilities. The rule set programmer makes changes and views their effects without delay or 
exiting from the system. 

The system graphically displays the inference process of the simulated chip execution within 
the system windows. This facilitates debugging and refinement of the rule set. Rule by rule 
analysis of the simulation is possible as well as monitoring overall behavior. The user selects a 
subset of the rules. This subset, which may be one, some, or all of the rules, can be fired one 
at a time or simultaneously. The effect on the chip output is displayed in a separate window. 
Any subset may also be ‘unfired’, or deleted after firing of some, or all of the rules. The system 
then displays the intermediate or final output that would result absent that rule or subset of 
rules. Again, this unfiring may be done stepwise or simultaneously. 

The system makes the output available at interprocess communication sockets, and similarly 
will accept input variable values at sockets. A simulation of a process to be controlled by the 
chip may thus be controlled directly from the rule set programming environment. The actual 
operation of the current rule set on the controlled process may be monitored and the rule set 
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Figure 11: Window configuration. 
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changed immediately in response to this simulation. 

5.4 Device Driver 

Driving the chip is fairly simple. It is done by down loading a rule set and setting the chip to 
run mode. At execution time, the chip can communicate with A/D and D/A converters either 
directly or through a host. To the host, the luzzy logic chip looks like a static RAM chip. It 
has the usual R/W and enable pins. Down-loading of the rule is done using address and data 
registers. 
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AN APPLICATION OF FUZZY LOGIC TO ROBOTIC VISION AND CONTROL 

Abstract 

A robot arm system able to manipulate a moving object on a belt conveyor at 
various speeds is built, consisting of two parts. The first part is related 
to recognizing patterns in real time. In this part, a method of construct- 
ing a fuzzy discriminant tree is proposed, where three newly defined 
measures called effectiveness, importance, and applicability are introduced. 
The robot arm system is able to recognize the shape and the size of moving 
patterns on a belt conveyor based on the fuzzy discriminant tree. The 
second part is to replace (grasp and put) a moving object based on fuzzy 
inference (or approximate reasoning) rules with the aid of an image process- 
ing technique. The whole system is controlled by one 1 6- b i t personal 
computer and works in real time. The advantages of the proposed method are 
the reduction of processing time and the availability of low-level devices 
which have not been realized by other methods. 
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Fig. 6. Twelve patterns used in the shape recognition experiment. 
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(b) Fuzzy labels of Distance betueen object and robot-hand ( L ) 
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(c) Fuzzy labels of ( estimated )moving-Distance ( P ) 
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Table 1 Fuzzy labels of H, A, P 
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(b) Fuzzy labels of A 



-10 

-9 

negative - 
-8 -7 -6 

-5 

-4 

-3 

-2 

-1 

0 

1 

2 

3 

4 

5 

positive 
6 7 8 

9 

10 

A1 

1 

1 

.9 

.6 

.2 

.1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

A2 

0 

.1 

.2 

.4 

.8 

1 

1 

.8 

.4 

.2 

.1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

A3 

0 

0 

0 

0 

0 

0 

. 1 

.2 

.6 

.9 

1 

.9 

.6 

.2 

.1 

0 

0 

0 

0 

0 

0 

A4 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

.1 

.2 

.4 

.8 

1 

1 

.8 

.6 

.2 

.1 

0 

A5 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

.1 

. 2 

.6 

.9 

1 

1 


(c) Fuzzy labels of P 
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OPTICAL NEURAL COMPUTERS 
Abstract 

A neural computer consists of a large number of simple processing elements 
(neurons) that are densely interconnected. The information that is needed 
to solve a particular problem is stored in the strength of the interconnec- 
tions using a learning procedure. Some of the basic characteristics of such 
a computer and the class of problems for which it is best suited are dis- 
cussed. Optics is a technology particularly well suited for implementing 
neural computers because of the relative ease with which a programmable, 
massive interconnection network can be optically synthesized. Several 
experimental demonstrations of optical networks will be described and the 
ultimate capabilities of optical neural computers will be projected. 
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WHAT IS THE SIGNIFICANCE OF NEURAL NETWORKS FOR Al? 

Abstract 

Associative memory (AM) and attentive associative memory (AAM) have been 
reviewed in terms of simple neural networks (both uniform and nonuniform 
matched filter banks - read by inner products and written by outer products 
in parallel). Whereas AM has been applied to optical character recognition 
(OCR) using the set of orthogonal feature vectors deduced from image proc- 
essing and computer vision, AAM can incorporate Al expert system techniques 
for determining the nonuniform linear combination of outer products. A 
rule-based system can more efficiently incorporate the frequency distribu- 
tion of distorted characters according to user group profiles; i.e., left- 
handed versus right-handed writing. Specifically, in this paper we have 
examined the degree of fault tolerance in AM, the ability of genera 1 i zat ion 
by interpolation (auto-associative memory), and abstraction by extrapolation 
(hetero-assoc i at i ve memory). The efficiency of the closed system of rule- 
based knowledge representation of Al using tuple storage has been combined 
with the flexibility of the non-rule-based open system using the matrix 
knowledge representation of Nl (coined for either neural, or network, or 
natural intelligence). Thus, the ability of generalization and abstraction 
becomes possible in a combined intelligent system of Al and Nl. 


WHAT IS THE SIGNIFICANCE OF NEURAL NETWORKS FOR AI ? 


by Harold H. Szu 

Naval Research Laboratory, Code 5756 
Washington D.C. 20375 

ABSTRACT 

Associative memory (AM) and attentive associative memory (AAM) have been reviewed 
in terms of simple neural networks (both uniform and non-uniform matched filter banks: read by 
inner products and write by outer products in parallel). While AM has been applied to the optical 
character recognition (OCR) using the set of orthogonal feature vectors deduced from image 
processing and computer vision, AAM can incorporate AI expert system techniques for 
determining the non-uniform linear combination of outer products. A rule-based system can 
more efficiently incorporate the frequency distribution of distorted characters according to user 
group profiles, say left-handed writing versus right-handed writing. Specifically in this paper, we 
have examined the degree of fault tolerance in AM, the ability of generalization by interpolation 
(auto-associative memory) and abstraction by extrapolation (hetero-associative memory). The 
efficiency of the closed system of rule-based knowledge representation of AI using the tuple storage 
has been combined with the flexibility of the non-rule based open system using the matrix 
knowledge representation of NI (coined for either Neural, or Network, or Natural Intelligence). 
Thus, the ability of generalization and abstraction becomes possible in a combined intelligent 
system of AI and NI. 

1. INTRODUCTION 

The question of the significance of neural networks for AI may be subdivided into three 

aspects. 


(i) How can neural networks help solve AI problems ? 

ANSWER: Both the well understood fault-tolerance of associative memory (AM), and the 
lesser understood ability of neural networks for generalization and abstraction, can be usefully 
incorporated into AI techniques. 

(ii) How can AI help solve neural network problems ? 

ANSWER: Similar to computer aided design, AI expert systems with a neural network 
modules can help design special purpose architectures for neural network computing. 

(iii) Wliat unsolved problems can be solved efficiently by combining AI and NI (coined for 
either Neural, or Network, or Natural Intelligence) techniques to utilize their respective strengths? 

ANSWER: The optical character recogniton (OCR) for reading hand-written bank check 
and zip-codes, can be solved by combining both AI and NI techniques, as described in this paper. 


Because we can only build a small neural network, we wish to endow a small sei 
neurons with a human-like intelligence. With present technology, whether it be electronic 
optical, one cannot build a neural network of more than several hundred neurons, using existi 
processor elements (PE's), because of the technological difficulty associated with der 
interconnectivity, about N^ for N T PE's. Thus, artificial neural networks can not yet match the si 
and the complexity of the human brain, that has billions of neurons and thousands 
interconnects for each neuron. If we are not, overly ambitious in developing a general purpc 
neural computer, we can built a special purpose neural computer for solving special purpc 
problems, such as OCR. 

One way to accomplish this special purpose neural computer is to combine the traditior 
rule-based AI wisdom with non-rule-based NI learning. This is particularly desirable in solvi: 
OCR problems because the available small neural networks can use better feature vectors obtain 
from other disciplines. Neural networks, built with current technology, can then provide fai 
tolerance for input feature vectors variations. The specific problem of hand-written characl 
recognition, differs from the more regular, hand-printed, alphanumeric recognition problem 
that it must account for such complications as connected characters and characters broken 
segmentation. 

Conceptually, one could solve the OCR problem using analytic, rule-based AI or neui 
network techniques. The OCR problem can be subdivided into character (or character strin 
statistics, font recognition, and character recognition; the most efficient techniques for these thr 
subproblems are analytic (statistical), rule-based AI, and neural networks, respectively. Since f 
statistical techniques, applied to alphanumeric frequencies, is well known, this topic will noti 
discussed further. In solving the font recognition subproblem, AI rules can be set by the (statist* 
frequency distribution of individual distorted characters according to user group profiles, e.g. le 
handed writing versus right-handed writing. It is efficient to design AI expert system that drz 
upon the classical statistical pattern recognition, e.g. one stroke difference exists between "P " a] 
"R ", or between"0 " and "Q ", or in a low pass filter viewpoint only one stroke locatior 
difference exists among four rounded letters "P " and ”R ","0 ", and "Q Furthermore, the 
rules of pair character distortion distribution can help solve the problem of connected charac: 
and broken character after segmentation, such as two scripted zeros. The pair characer correlati 
matrix can be analyzed by the technique of the Karhunen-Loeve procedure in image processir 
The Karhunen-Loeve technique is compatable with AM's outer product decomposition. With t 
help of AI rule-based system, both the first and the second order statistics can be incorporated in t 
formalism of attentive associative memory (AAM), that processess the extra degrees of freedom 
the non-uniform storage of vector outer products based on a given set of critical feature vectors. 

Because the open-ended knowledge of input pattern variations may be efficien 
controlled by using other disciplinary knowledge, such as AI and computer vision with a result 
better combined technology, we shall review AM and AAM, and various OCR approaches 
means of their specific techniques used for feature extraction and techniques used for gro 
classification. The sooner we accept implementation limitations of the present neurocompute 
the better we can work with other disciplinary researchers. For example, we can work w 
researchers in AI, computer vision, image processing. Since this cross disciplinary collaboration 


by nature not easy because of different trainings and languages involved, then this paper may serve 
a door opener for both. 

Pattern recognition reseachers have been successful in machine-printed character 
recognition (CR) compared to optical character recognition (OCR) of hand-written bank checks or 
zipcodes. Difficulties of applying AI alone to an intelligent OCR may be due to the lack of non- 
rule-based capability of generalization and abstraction. This may be constrained by the traditional 
AI one dimensional (1-D) knowledge representation, e.g. an ordered set of tuples used in semantic 
networks. Similarly, difficulties of applying the neural network alone to an intelligent OCR may 
be in selecting critical features that is precisely one of the most challenging and unsolved problems 
(others are segmentations and locations). On the other hand, AI is efficient in reduce the problem 
to a sub-problem based on 1-D knowledge representation of simple rules, and NI provides the 
fault-tolerant OCR system based on 2-D knowlege representation. Together they give the possibility 
of generalization and abstraction. Thus, Szu and Tan (1988) have considered a less risky approach 
that consists of the traditional AI researchers who know about OCR critical features, and the neural 
network experts who know about AM fault tolerance. Technological developments have pointed 
to the readiness of such collaborations, since 2-D storage by chips or optical disks becomes cheaper 
than the traditional 1-D content addressable memory (CAD) processor. What's needed is a smart 
coprocessor such as neurocomputer. As a matter of fact, due to the 2-D nature of light, optical 
expert systems based on AM have been designed by Szu and Caulfield (1987) who have shown as 
simple replacement of 1-D tuples by 2-D matrices in a semantic network the alias problem for data 
fusion is solved by matrix addition and thresholding. The opto-electronical implementation of 
attentative associative memory model of Athale, Szu & Frielander (1986) can be expanded by 
means of a priori probability compiled by a pair-character correlation function of script letters. 
These papers may facilitate both sides the starting line of collaborations. 

In this paper, we have reviewed the orthogonal subspaces of features and examined (1) the 
degree of fault tolerance , (2) the generalization by interpolation to other orthogonal feature vectors 
within the subspace, and (3) the abstraction by extrapolation to other subspaces. AAM may be 
formulated by a linear combination of outer products based on a set of orthogonal feature vectors. 
The combination coefficient is called the attention parameter, because it enters into the eigenvalue 
of AAM matrix that governs the recall convergence. We review briefly about the dynamics of 
attentive associative memory published by Szu (1988) elsewhere using arbitrary coefficients. In 
this paper we explicitly introduce a Al-tuple for the attention vector a = {a n , n=l,...M}, where the 
inner product between the difference vector between an averaged stochastic input I Q > and a fixed 
memory state lm> is naturally used as the attention parameter defined in terms of Dirac's inner 
product notation: a m = <m|m> - < m I Q >. Such an AAM matrix has non-white eigenvalue 
spectrum X n = a n - (A / B ) where the attentive memory capacity is A = I M n =l a n> anc * B is the 
length of the feature vectors (e.g. the number of bits). Iterative recalls are used. Paying non- 
uniform attention (a n > 1) increases the memory capacity A > M together with a faster 

convergence rate proportional to the larger eigenvalue Xpn ^ X than a uniform attention! i.e.a m = 
1). Szu's (1988) analysis has suggested that the eigenvalue spectrum and its dithering by input 
ensemble can play a crucial role for the convergence associated with a nonlinear dynamical 
system. \ 


2. Associative Memory 


I 

Matrix associative memory works like a parallel bank of matched filters but much rr 
efficiently in at least three counts: (1) no address coding of input and decoding for output 
necessary , (2) operations are done in parallel, and (3) the connectivity matrix can be determi 
bv itself using various adaptive (learning) algorithms. 

An analytical and numerical example of AM is given as follows: 

We denote M feature vectors as binary words, u( m ), m=l,...M. Each word has B bits, 
inner product of Eq(l) measures the norm, the number of bits that are one. 

U T • U = # of one's (1) 

where the superscript transpose the column vector to a row vector. 

The associated bipolar words, denoted by V , m=l, ...M, are defined as follows: 

V = (2 U - 1) = Sgn( U ) (2) 

where the unit vector 1 has all entries equal 1 and Sgn is the sign function that changes zero ; 
negative quantities to -1. We prefer bipolar version to binary version because : (1) the in 
product norm is always identical to the number of bits, B: 

VT • V = B = <V I V> , (3) ^ 

rewritten here in terms of Dirac's bracket notation: <bra I ket> for the inner and I ketxbra I for 
outer product, (2) the nature of "exclusive or" can be easily represented by bipolar multiplication 

+1 x +1 = 1, -1 x -1 = 1, +1 x -1 = -1, -1 x +1 = -1, 

(3) the inner product norm is related to the Hamming distance, defined to be the numbei 
different bits between two vectors no matter where the differences occur. 

We assume an orthogonal set of feature vectors defined as follows: 

y(n )T *v(m) = B 5 ni m = < n | m > (4) 

where 5 nm is the Kronecker delta. The outer product weight matrix W represents ai 
associative memory: 


[ W ]= Z m [ V< m )v(m)]= I m | m > < m I 


(5) 


Hopfield (1982, 1984) assumed the auto-associative matrix [T] to be traceless. That was used 
together with the symmetry property to prove convergence. Thus, the second term of Kronecker’s 
delta matrix (l's along the main diagonal and zero elsewhere) is introduced in Eq (6) to make it 
traceless. 


B[ T ] ij = [ W ] ij - M 5g (6) 

B is the normalization constant, and M is the memory capacity. Using the trace operation denoted 
by Tr, we can easily verify Eq (6) to be traceless. 

Tr( I mxm I ) =B (7) 

Tr( [8y ] ) = B (8) 

The tradeoff between the memory capacity and the degree of fault-tolerance has been estimated to 
be about 15 % of B bits [Hopfield (19S2)] for pseudo-orthogonal vectors. That is, 

M = 0.15 B (9) 


For orthogonal feature vectors, however, the capacity is 100 %. 

M= B (10) 

This fact can be demonstrated by the eigenvalue problem of the matrix which is defined to be 

[ T ] I n > = ln> (11) 

where the eigenvalue can be easily verified, using Eqs (4) and (6), to be degenerate , namely, a 
white spectrum for all M states, 

V, = 1 - (M/B) (12) 

The full capacity, M = B, corresponds to a zero eigenvalue for all B orthogonal eigenstates, one for 
each feature vector. 

Consider a simple example where B = 4. There are 4 possible orthogonal vectors and 2^ = 
16 possible words denoted by: 

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 - 


We introduce orthogonal subspaces defined by the number of contiguous l's in the binary 
word. The subspace consisting of words 13, 11, 7, and 14 is obviously orthogonal by shifting a "one" 
among 3 zeroes from the the left to the right end of the word. 


Word Binary Word Binary 


P 


13 

1101 

2 

comple. 

0010 

3 

13 

11 

1 Oil 

4 

0100 

3 

11 

7 

0111 

8 

1000 

3 

7 

14 

1110 

1 

0001 

3 

14 

15 

mi 

0 

0000 

4 

15 

6 

0110 

9 

1001 

2 

6 

12 

1100 

3 

0011 

2 

12 

10 

1010 

5 

0101 

1 

10 


P 

Word Bipolar 

Word Bipolar 

11-11 

2 

comple. 
-1-1 +1-1 

3 

1-111 

4 

-1+1-1 -1 

3 

-1111 

8 

+1-1-1-1 

3 

111-1 

1 

-l-l-l+l 

3 

1111 

0 

-1-1 -1 -1 

4 

-1 1 1-1 

9 

+ 1-1-1+1 

2 

1 1 -1-1 

3 

-l-l+l + l 

2 

1-1 1-1 

5 

-l + l-l + l 

1 


It is readily verified that the subspace of bipolar words (13, 11, 7, 14) are mutua 
orthogonal to one another, as shown in Figure 1. They happen to be related to the Walsh funct 
of periodicity p=3. The corresponding binary words have an equal angle among them [cos'l (2/ 
that is not 90°. Also, the second subspace of bipolar words (15, 6, 12, 10) are also orthogonal but 
two subspaces are not orthogonal to each other. 



Figure 1. Two-Dimensional Representation of Walsh Base Functions 
Used to illustrate the fault tolerance and generalization properties 
of Associative Memory 


We consider the storage of one word in memory. 


4 [ T] ] =[13] = 113x131 -5 


(13) 








If the outer product is properly normalized, it is related to the projection operator: 

[P] = 8 - 113x131 (1/B) (14) 

Using Eq (4) , it can be verified that 

[P] 2 =[P]. (15) 

We will show (1) the ability of fault tolerance, and (2) the ability for generalization. 

Fault Tolerance 

The following sequence of erasing (zero out) successively from the bipolar bits illustrate 
tolerance of missing bits. 

(1) one missing bit 

[13]( 0 1-1 l) T =Sgn( 3 -2 -2 3) T = 1 13> (16) 

where Sgn is sign function representing the sigmoid neuron response by the point nonlinearit) 
extracting the algebra sign of each entries. 

(2) two missing bits 


[13] ( 0 0-1 l) T =Sgn( 2 2-1 1) T = 1 13> 


(17) 


(3) three missing bits 

[13] (000 l) T =Sgn( 1 1-1 0) T = 1 12> 

[13]— ( 0 0 0 1) T = Sgn( 11-1 3)T= 1 13> (18) 


(4) four missing bits 


[13] (0 0 0 0) T = Sgn ( 0 0 0 0 ) T = (-1-1-1-1)T= 1 0> 

[13] 2 (0 0 0 0) T = Sgn (-1-1 3 -1) T = 1 2> 

[13]3(0 0 0 0) T = Sgn (-3-3+3-3) T = 1 2> (19) 

which converges to a fixed point that is precisely the bipolar complement to 1 13>. In other words, 
the phase information is lost as an overall minus sign in the last case. 

The following sequence of reversing successively from the bipolar bits illustrate tolerance 
of erroneous bits. 


(1) one erroneous bit. 


[13]( -2 1-1 1) T -Sgn( 3 1-1 1) T = i 13> (20) 

(2) two erroneous bits. 

[13]( -1 -2- -1 l) T =Sgn( 1 1 1 - 1 )T= 1 14 > 

[13] 2 (- 1 -1 - -1 1) T - Sgn (-1 -1 -1 1) T = 1 1> 

[13]3(-l -l -1 1 )T = Sgn (1 1 1 1) T = 1 15> 

[13 ] 4 (-l -2 -1 1 ) T = Sgn (1 1-3 1) T = 1 13> (21) 

(3) three erroneous bits. 

[13]( -2 -2- 2 1)T =Sgn( -1 -1 1 -3) T = I 2 > (22) 

which also converges to a fixed point that is also the bipolar complement of 1 13>. 

Generalization within a subspace 

We consider the ability to recognize a new vector that is different from the stored vec^ 
In other words, an AM can recognize its related vectors that has not been memorized before, 
recognition, we mean convergence to a different fixed point. In this sense, we say that the AM c 
generalize its memory to include other fixed points. 

In the case of bipolar vectors, if and only if a new vector x is orthogonal to the stor 
vectors, associative recall "converges in a cycle of two" as defined in the following iterations: 

Sgn( [ T ] I x >) = - 1 x > (23a) 

Sgn( - [ T ] lx>) = + lx> (23b) 

This necessary and sufficient condition allows us to determine efficiently the orthogonali 
between a new vector and all the stored vectors. 

We shall show that when a new vector 1 11> is presented to the AM [13], due to t 
orthogonality between 1 13> and 1 11> and traceless property of [13], 


[13] 1 11> =Sgn( - 1 11>) = 1 4>, and 
[13]— 1 11> = lll> 


( 24 ) 


Once the system has acknowledged the second vector 1 1 1 >, it is incorporated into the 
matrix storage. 

4 [ T 2 ] =[13,11] =[13] +[11] 

= 113x131+111x111-28 (25) 

If another vector, 1 7> is presented, 

[13.11] 1 7> =Sgn( -2 1 7>)= I 8 >,and 

[13.11] - 1 7> = Sgn(4 1 7> )= I7> (26) 

Thus, we enlarge the memory storage to have three memorized states. 

4[ T 3 ] = [13,11,7] = [13] + [11] + [7 ] = 

113x131 + illxll I + 17x71 -35 (27) 

This process is continued until the 4-bit orthogonal subspace (p=3) is filled up. 

4 [ T 4 ] = [13] +[11] + [7 ] + [14] (28) 

We have demonstrated the ability to include other orthogonal vectors that have not been 
stored before. This example also shows the important consequence of traceless storage through its 
contribution to the "generalization by interpolation within the orthogonal subspace". 

Given a table of orthogonal vectors, one may argue that computing inner products will 
also determine orthogonality. However, inner products must be done pairwise among all vectors 
and become inefficient as the number of vectors gets large. The above method remains efficient for 
all sizes. 


One may furthermore argue that the difficulty is not how to construct orthogonal set, but 
to select critical bipolar features from gray-scale, imperfect images. 

Algorithms for Construct A Critical Feature : 

We shall not rely on the auto-AM to select features. One can carry out one’s favorite 
image processing procedure to extract a set of gray-scale feature vectors, { I F >}. Bipolar feature 
vectors are preferred in AM because of demonstrated fault-tolerance and the special ability of 
traceless outer product that allow a quick convergence to a fixed point of cycle two. Given a gray- 
scale feature vector I F>, several procedures for generating a bipolar feature vector are given. The 
first procedure is "bipolarization", i.e. , 


I f > = Sgn ( I F> - threshold ) 


(29) 


The second procedure is to use the Walsh transform. We apply two-dimensional 
transform (as orthogonal bipolar vector spacef I wj > }) to all grayscale features. We select i 
bipolar feature vector from a specific Walsh base vector that is associated with the maxim 
coefficient in the Walsh transform. 

i f> = Sgn(Maxj (I I wjx wj I F>) - threshold) (30) 

where the orthonormality condition of Walsh base vectors is inserted to relate to the first methc 

I !wj>< wj I = [1] (31) 

The third and the fourth procedures are to extract from the arbitrary feature vector I < 
the closest vector I g> from either the bipolar orthogonal feature set { I N> ) or the { I F> } using 
following traceless associative memory storage. 

I g> = Sgn( [IS I N> < F I ] I G> - threshold) (32) 

I g> = Sgn ( I cp [ I F> <F ] I G> - threshold) (33) 

The linear combination coefficients { cp ) may be determined by the statistics of sin 
character distortions and variances (similar to finding the normal modes that diagonalizes 
covariance matrix and the Karhunen-Loeve orthogonal procedure used for outer prod 
representation of 2-D imagery). Furthermore, the statistics of character pair distortions, sud| 
two scripted zeros, could be used to determine the coefficients so as to resolve the proble* 
recognizing connected character and broken character after segmentation. We will not go l 
details in this approach, because of its problem-dependent nature. 

The mechanism to select critical features is given as follows. 

(1) Human being picks a critical feature (pictures) among the set of distorted, handwrit 
characters, e. g. the extra stroke among O, P, Q . 

(2) Walsh transform the selected feature. 

(3) Pick the Walsh function that has the largest transform value. 

We choose a feature vector that is closest to the Walsh vector associated with the largest Wc 
transform coefficient, and the rest follows from the procedure described in eq (24-28). We call S' 
a set of features the critical features. 

Lessons to be learned about applying associative memory to pattern recognition: 

AM can only do so much. There is no way to judge the correctness of an associative rec 
except by the convergence to a fixed point. One can only assign meaning to those fixed poi: 
whether it is new or old. The proven capabilities of the AM model are (1) missing and errone 


bits recovery, and (2) the creation of new orthogonal vectors, as illustrated above. Therefore, to 
apply AM to pattern recognition, one must apply human interpretations to those capabilities. 

Since learning is by trial and error, it is a continuous process. Suppose that a feature 
vector with many components representing many features (such as leg-feature and fur-feature, etc, 
for a tiger, coded fully as 1 13>) has been memorized by the traceless outer product. Furthermore, 
suppose that only certain features are known in a sequence of imperfect input vectors. (I. e., some 
feature values are missing, e. g. , the first in the sequence is (0, 0, 1, 1)). Then, the AM can fill in 
the missing bits. After three iterations, one finds (-1, -1, 1, -1)7= I 2>. One can then enlarge the 
traceless outer product memory to include both vectors, [13, 2]. One examines the second input 
vectors (0, 0, 1, 1). One can verify that the enlarge memory can indeed recall the vector 12 >, 
which correspond to, say, a lady, rather than a tiger. The AM "mental" capacity of recognizing 
other distinct objects when they show up has been demonstrated. Following this line of thought, 
the different subspace of different size could be assigned for different classes of objects related by a 
hetero-associative memory of a rectangular matrix. Such a recognition of different classes requires 
a complete feature set coded in the AM. It can fill all orthogonal subspaces by the "generalization 
procedure" illustrated in Eq(24-28). 

3. ATTENTIVE ASSOCIATIVE MEMORY 


Recently, Amari et al has studied the dynamics of such a system, which we will give a 
simple theorem. We summarize our model equations as follows: 


< n| m > = B 5 n ,m 

(34) 

[ T ] 1 n> = 1 n> 

(35) 

The simple model of attentive associative memory [ T ] is a linear combination of outer products 
based on the set of orthogonal feature vectors, { | n> , n =1, ... M), and a cue of initial state 1 Q > that 
determines the set of attention parameters { a n } as follows: 

a n =<n|n>-<n|Q> 

(36) 

B [ T ] jj = n _i a n | n j > < n j | - A [5jj] 

(37) 

that is traceless, Tr 5jj = Tr | n j ><nj | = B , giving 

A^ M n =1 a n 

and 

= a n * (A/B) 

(38) 

(39) 

The attentive memory capacity A and eigenvalue aie reduced to liopfield's memory capacity N 

and a degenerate eigenvalue X, in case of a uniform attention( i.e. a n = 1), 


where Amari's pattern ratio r = (M/B) is defined for M bipolar words (states) of B bits (neurons) 
each. 


The dynamics is assumed to be governed by matrix-vector inner product 

Q(t + 1) = Sgn( [ T ] Q (t) ) (41) 

where a point nonlinear ity function is defined as Sgn(x) = + 1 if x > 0 , and - 1 if x < 0. 
succesive associative recall gives the iteration, indexed by t= 0, 1,2,..., such that Q (t)= Q when 
O.The eigenvalue spectrum, not the distance alone, is a proper macroscopic parameter to explain 
transient dynamical behaviors of the recalling process. In particular, the direction cosine 

S m (t) ) = < m| Q(t) > / < m j m > (42) 

has been derived and the logarithmic derivative is given by 

(d/dt) log ( 1 - S m (t)) < log ( X m / 2 ) < 0 (43) 

Convergence to a specific m-th state is guaranteed if m-th eigenvalue ( l m ) is bounded 2 > ^ 

Theorem 1 about the lower bound says that paying attention (i.e. non-uniform a n 
always increases the memory capacity A ) I^ n = -i a n > M with a faster convergence 
proportional to the eigenvalue l m >1 = 1 - r 

We conjecture that the statistical neurodynamics of associative memory may have similar beha 
to the deterministic dynamics of attentive associative memory with a non-white eigenv; 
spectrum due to random initial conditions that change with respect to the initial guess vector h 
>, t =0. The difference vector between ! Q(t) > from I m > has an inner product norm defined as 

2 D m (t) = < m I m > - < m I Q(t) > (44) 

If we assume that paying attention to the initial small guess error 2 D m (0) amounts to choo: 
nonuniform and biased storage 

app = 2 D m (0) > 1 (45) 

and all other coefficients to be identical to 1 


ap — 1 , n m . 


(46) 


By definition 


A = M + 2 D m (0) - 1. (47) 

Theorem 2 about the upper bound of Xm assumes that if a small difference vector betvveei 
the input I Q > and the specific state I m >, is used as the attention parameter a m , Eq(31a), then the 
critical relationship between the Amari's pattern ratio r and the initial error is analytically founc 
for successful recalls. 

2 D m (0) <2 + (M + 1)/ (B - 1 ) (48) 

The maximum permissible Hamming distance Dpj, from the desired m-th state to b< 
reached after iterative recalls, is given by the formula 

D H < ( B/2) - 1 -[ ( M- 1 )/ 2 ( B + 1 ) ] ( ( B/2) - 1 - (r/2 ) (49) 

4, Conclusion 

Associative memory (AM) works like a match filter , but does so efficiently. It should not 
be applied to image domain directly. Rather, it should be applied to feature domain so that a 
relatively small AM can do useful tasks at the present technology. 

We shall not rely on the auto-AM to select features. Instead, features should be selected 
using human judgement. However, auto-AM will help us find critical features and hetero- 
associative memory can perform feature extraction efficiently. 

There exists a large body of knowledge pertaining to features selection and extraction and 
pattern classification for traditional optical character recognition in the literature. This body of 
knowledge should be tapped and coupled with associative memory. One should not rule out 
the use of traditional classification techniques (such as syntactical) as extraction of high-level 
features which then become part of the input feature vector to an AM. 

Classical pattern recognition has been demonstrated with a relatively greater success in 
machine-printed character recognition compared to handprinted character recognition. 
Difficulty may be rooted in the lack of generalization and abstraction due to machine's limited 
one-dimensional knowledge representation. In principle, AM should be able to complement 
traditional OCR with 2-D knowledge representation. Various degrees of abstraction can be 
achieved through a multi-layer, two-dimensional AM architecture. Note that the present 
technology has evolved to the point where 2-D memory (chip or optical disk) is not more 
expensive than 1-D memory storage with logic unit tree content addressable memory 
processor. 


In conclusion, we can combine traditional wisdom in traditional OCR with simple ^ 
implementable in present technology to form a human-intelligence-endowed neui 
network. 

Character segmentation is an important step in character recognition. Fukushima h 
developed neural network model (selective attention) for character segmentation in I 
Neocognitron [Fukushima (1987)]. The attentive associative memory model implement 
opto-eiectronically by Athale, Szu & Friedlander (1986) can be augmented by a priori probabili 
compiled by a character-pair correlation function of connected characters. This is an interesti: 
area for more research. 

Inputs to associative memory are linear vectors whereas inputs to OCR are rectangul 
arrays. Can associative memory replicate the concept of (2-D) neighborhood? The tw 
dimensional transform that preserves the neighborhood relationship should be used for ima; 
pre-processing before applying AM to the pattern. For example, 2-D Walsh transform can gi 
a 1-D base Walsh vector (associated with the largest coefficient) as input feature vector to t 
AM. 

Can AM perform syntactical parsing [Ali and Pavlidis (1977)] or rule-based structui 
analysis [D'Amato (1982)]? Any traditional classification technique can be used to extract hig 
level features for AM. 

How can AM extract position and rotation invariant features? [cf. Szu (1986), Messner ai 
Szu (1987)]. ^ 

One difficulty in applying backpropagation network has been network size-scalii 
problem. One way to circumvent it has been to extract a small number of features as input. | 
Burr (1987), Gullichsen and Chang (1987)]. Recent advances by Ballard in 1987 permit parti 
connectivity between two successive layers which avoids combinatorial explosions of t< 
encountered when the input layer is directly connected to image pixels. Thus, spatial patte 
relationship can be efficiently preserved in such a network while coarse-graining betwei 
successive layers can desensitize pattern variation in input images. 

An AI extension of the simple AM model is attentive associative memory, (AAM), th 
allows us to apply AI to pay a non-uniform attention to each term of outer product storage, i 
a linear combination of outer products in which the set of combination coefficients 
determined by AI rule-based system, e.g. the frequency distribution of distorted characte 
according to user group profiles, e.g. left hand writing versus righthand writing. The efficieni 
of the closed system of rule-based knowledge representation of AI using the tuple storage 
combined with the flexibility of the non-rule based open system using the matrix knowled; 
representation of NI ( coined for either neural, or network, or natural intelligence). Thus, t] 
ability of generalization and abstraction becomes possible for AI, and is demonstrated in 
combined intelligent system of AI & NI. We can endow a simple neural network architectu 
based on a small set of neurons with a human-like intelligence by combining the tradition 
rule-based AI wisdom with non-rule-based learning. This is achievable because OCR requir 


better feature vectors obtained from other discipline in the sense of fault tolerance that neural 
networks built at the present technology can already provide with. 

Appendix: Generic Definition of Neural Networks 

Associative memory is a special model of neural networks. Examples of associative 
recalls from partial images and the success of nonlinear signal processing are recorded in the 
literature [cf. Kohonen (1984)]. An axiomatic definition is outlined as follows. 

We shall define three kinds of neurons: fine-grained, medium-grained and large-grained 
processor elements (PEs). A fine-grained PE, represented by the lower case word neuron , has 
no internal memory analogous to neurons in the hippocampus part of the brain that is 
responsible for fault-tolerant associative recall. A medium-grained PE, Neuron, has a built-in 
memory analogous to Neurons in biological sensory and motor control which are responsible 
for reactions to approaching danger. A large-grained PE, NEURON , has built-in memory, 
control logic, and communication capabilities equivalent to a computer. NEURONs occur in 
nature in the form of grandmother cells or pacer/conductor cells. 

These three types of neurons and their associated circuits have four kinds of interactions: 
(1) exciting, (2) inhibiting, (3) bursting, (4) grading and delaying transmission. In general they 
follow the law of the middle response or the sigmoid function (hyperbolic tangent or logistic 
functions) to amplify weak signals with a nonlinear quick rising function and suppress strong 
signals with a nonlinear tapering off saturation function. The generic definition of a Neural 
Network is a system which is: 

1. Non-linear ~ sigmoid function = point non-linearity (hard limiting) shown as 
follows: 


2. Non-local = weighted outer product = outer product (white spectrum) shown as 
follows: 


<?.- 


3. N on-stationary - piecewise time stationary = iterative algorithm shown as follows: 



4. Non-convex = constrained global optimization = simulated annealing schematica 
shown as follows: 


i 


5. Other attributes yet to be discovered . 

These successive approximations of the four h oh— principles, indicated by wiggly equal 
signs in (1-4), makes possible the unveiling of the complex and nonlinear neural (bra 
behavior. This is possible with the use of powerful computers and more accurate models 
intelligent functions. The theory is amenable to numerical simulations due to piecew i 
linear, regionally local, temporarily stationary, and locally convex approximations. 

Three decades ago, Rosenblatt and co-workers built the perceptron solely based upon t 
first attribute (nonlinearity) with stochastic implementations. Thus, with hindsight, it was r 
surprising that Minsky and Papert could show a limited utility and propose useful alternati' 
artificial intelligence (AI) rule- based systems. AJ works in closed systems where rules gove 
while neural intelligence (NI) works in open systems where rules have yet to be discovert 
Various exploitation of these efforts in neural networks are: 




The term wet-ware, coined by Carver Mead, is neither software nor hardware, but more like a 
Hecht-Nielsen's net-ware based on non-programmable but trainable networks. A special version 
of layered neural networks has been demonstrated with the ability of phonetic interpolation in 
the Rumelhart, Sejnowski connectionist’s networks, such as Net-Talk, Boltzmann and Cauchy 
Machines, and error back propagation networks. 
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NEURAL MODELING OF SELECTIVE ATTENTION 
Abstract 

A neural network is presented in which there are modifiable, bidirectional 
connections between nodes representing sensory events and other nodes repre- 
senting reinforcement sources. There is also competition between sensory 
nodes. Through these competitive and associative mechanisms, the presenter, 
together with Stephen Grossberg, has simulated some data on attent ional ly 
modulated Pavlovian conditioning. In particular, if two stimuli are pre- 
sented simultaneously, and one of them has already been associated with a 
primary reinforcer (such as electric shock or food), selective attention 
occurs which inhibits the other stimulus from forming new associations. 
Context changes can profoundly alter the dynamics of selective attention. 

For example, if one stimulus has been paired with a reinforcer and that 
stimulus combined with another is paired with a greater or lesser amount of 
that reinforcer, the second stimulus is no longer blocked. Also, selective 
attention based on positive or negative reinforcement can compete with 
selective attention based on other criteria. Nonmot i vat iona 1 criteria are 
enhanced by frontal lobe damage, which weakens the sensory-reinforcement 
linkage. For example, a frontally lesioned monkey can prefer a novel object 
to one that has previously been rewarded. Also, a human frontal lobe 
patient can persevere in a habit that was once, but is no longer, rewarding. 
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APPLIED AND REAL NEURAL NETWORKS: 

A COORDINATED AND INTERDEPENDENT INVESTIGATION OF BOTH 

Abstract 

One and a half years ago, Caltech organized a new graduate program in 
Computation and Neural Systems (CNS) . This program involves 15 faculty 
members with interests as diverse as statistical physics, concurrent 
computing, analog VLSI, signal processing, optical computing, machine vision 
robotics, and the neurob i ol og i ca 1 study of numerous real neural systems. 

The Bower laboratory is an integral part of the CNS program with our primary 
interest being the coordinated study of information processing in real 
neural networks. The principal approach taken is one that views these 
networks as systems of complex processing elements having functions that are 
intimately related to their specific distributed architectures. Within the 
lab, our multidisciplinary approach includes standard anatomical and 
physiological investigations linked to computer modeling techniques. In 
addition, we are developing new experimental techniques which directly 
address computational issues in real neural structures. For example, we are 
using modern silicon manufactur i ng technology to make multisite brain 
recording electrodes which capture the activity of multiple, functionally 
related neurons. We have also constructed a general-purpose neural network 
simulator with interactive graphics (CAD/CAM for neural networks) that runs 
on concurrent computers. Finally, we have been exploring the use of applied 
neural networks in recognizing and categorizing recorded signals from real 
neurons. Each of these efforts is described. This work is sponsored by the 
Whitaker Foundation, the Joseph Drown Foundation, and Lockheed Corporation. 
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ABSTRACT 

Based on anatomical and physiological data, we have developed a computer simulation of piri- 
form (olfactory) cortex which is capable of reproducing spatial and temporal patterns of actual 
cortical activity under a variety of conditions. Using a simple Hebb-type learning rule in conjunc- 
tion with the cortical dynamics which emerge from the anatomical and physiological organiza- 
tion of the model, the simulations are capable of establishing cortical representations for differ- 
ent input patterns. The basis of these representations lies in the interaction of sparsely distribut- 
ed, highly divergent/convergent interconnections between modeled neurons. We have shown that 
different representations can be stored with minimal interference, and that following learning 
these representations are resistant to input degradation, allowing reconstruction of a representa- 
tion following only a partial presentation of an original training stimulus. Further, we have 
demonstrated that the degree of overlap of cortical representations for different stimuli can 
also be modulated. For instance similar input patterns can be induced to generate distinct cortical 
representations (discrimination), while dissimilar inputs can be induced to generate overlapping 
representations (accommodation). Both features are presumably important in classifying olfacto- 
ry stimuli. 


INTRODUCTION 

Piriform cortex is a primary olfactory cerebral cortical structure which receives 
second order input from the olfactory receptors via the olfactory bulb (Fig. 1). It 
is believed to play a significant role in the classification and storage of olfactory 
information 1,2,3 . For several years we have been using computer simulations as a 
tool for studying information processing within this cortex 4,5 . While we arc ulti- 
mately interested in higher order functional questions, our first modeling objective 
was to construct a computer simulation which contained sufficient neurobiological 
detail to reproduce experimentally obtained cortical activity patterns. We believe 
this first step is crucial both to establish correspondences between the model and 
the cortex, and to assure that the model is capable of generating output that can 
be compared to data from actual physiological experiments. In the current case, 
having demonstrated that the behavior of the simulation at least approximates 
that of the actual cortex 4 (Fig. 3), we are now using the model to explore the 
types of processing which could be carried out by this cortical structure. In partic- 
ular, in this paper we will describe the ability of the simulated cortex to store and 
recall cortical activity patterns generated by stimulus various conditions. We 
believe this approach can be used to provide experimentally testable hypotheses 
concerning the functional organization of this cortex which would have been diffi- 
cult to deduce solely from neurophysiological or neuroanatomical data. 



Fig. 1. Simplified block diagram of the olfactory system and closely related structures. 


MODEL DESCRIPTION 

This model is largely instructed by the neurobiology of piriform cortex 3 . Axon- 
al conduction velocities, time delays, and the general properties of neuronal inte- 
gration and the major intrinsic neuronal connections approximate those currently 
described in the actual cortex. However, the simulation reduces both the number 
and complexity of the simulated neurons (see below). As additional information 
concerning the these or other important features of the cortex is obtained it will be 
incorporated in the model. Bracketed numbers in the text refer to the relevent 
mathematical expressions found in the appendix. 

Neurons. The model contains three distinct populations of intrinsic cortical 
neurons, and a fourth set of cells which simulate cortical input from the olfactory 
bulb (Fig. 2). The intrinsic neurons consist of an excitatory population of pyrami- 
dal neurons (which are the principle neuronal type in this cortex), and two popula- 
tions of inhibitory intemeurons. In these simulations each population is modeled 
as 100 neurons arranged in a 10x10 array (the actual piriform cortex of the rat 
contains on the order of 10 6 neurons). The output of each modeled cell type con- 
sists of an all-or-none action potential which is generated when the membrane 
potential of the cell crosses a threshold [2.3]. This output reaches other neurons 
after a delay which is a function of the velocity of the fiber which connects them 
and the cortical distance from the originating neuron to each target neuron [2.0, 
2.4]. When an action potential arrives at a destination cell it triggers a conduc- 
tance change in a particular ionic channel type in that cell which has a characteris- 
tic time course, amplitude, and waveform [2.0, 2.1]. The effect of this conductance 
change on the transmembrane potential is to drive it towards the equilibrium 
potential of that channel. Na + , C1‘, and K + channels are included in the model. 
These channels are differentially activated by activity in synapses associated with 
different cell types (see below). 
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Fig. 2. Schematic diagram of piriform cortex showing an excitatory pyramidal cell and two 
inhibitory intemeurons with their local interactions. Circles indicate sites of synaptic modifia- 
bility. 


Connection Patterns. In the olfactory system, olfactory receptors project to the 
olfactory bulb which, in turn, projects directly to the piriform cortex and other olfac- 
tory structures (Fig. 1). The input to the piriform cortex from the olfactory bulb is 
delivered via a fiber bundle known as the lateral olfactory tract (LOT). This fiber 
tract appears to make sparse, non-topographic, excitatory connections with pyra- 
midal and feedforward inhibitory neurons across the extent of the cortex 3,6 . In the 
model this input is simulated as 100 independent cells each of which make ran- 
dom connections (p=0.05) with pyramidal and feedforward inhibitory neurons 
(Fig. 1 and 2). 

In addition to the input connections from the olfactory bulb, there is also an 
extensive set of connections between the neurons intrinsic to the cortex (Fig. 2). 
For example, the association fiber system arises from pyramidal cells and makes 
sparse, distributed excitatory connections with other pyramidal cells all across the 
cortex 7,8,9 . In the model these connections are randomly distributed with 0.05 
probability. In the model and in the actual cortex, pyramidal cells also make exci- 
tatory connections with nearby feedforward and feedback inhibitory cells. These 
intemeurons, in turn, make reciprocal inhibitory connections with the group of 
nearby pyramidal cells. The primary effect of the feedback inhibitory neurons is to 
inhibit pyramidal cell firing through a Cl' mediated current shunting mecha- 
nism 10,11,12 . Feedforward intemeurons inhibit pyramidal cells via a long latency, 
long duration, K + mediated hyperpolarizing potential 12,13 . Pyramidal cell axons 
also constitute the primary output of both the model and the actual piriform cor- 


Synaptic Properties and Modification Rules. In the model, each synaptic con- 
nection has an associated weight which determines the peak amplitude of the con- 
ductance change induced in the postsynaptic cell following presynaptic activity 
[2.0]. To study learning in the model, synaptic weights associated with some of 
the fiber systems are modifiable in an activity-dependent fashion (Fig. 2). The 
basic modification rule in each case is Hebb-like; i.e. change in synaptic strength 
is proportional to presynaptic activity multiplied by the offset of the postsynaptic 
membrane potential from a baseline potential. This baseline potential is set 
slightly more positive than the Cl' equilibrium potential associated with the shunt- 
ing feedback inhibition. This means that synapses activated while a destination 
cell is in a depolarized or excited state are strengthened, while those activated 
during a period of inhibition are weakened. In the model, synapses which follow 
this rule include the association fiber connections between excitatory pyramidal 
neurons as well as the connections between inhibitory neurons and pyramidal neu- 
rons. Whether these synapses are modifiable in this way in the actual cortex is a 
subject of active research in our lab. However, the model does mimic the actual 
synaptic properties associated with the input pathway (LOT) which we have 
shown to undergo a transient increase in synaptic strength following activation 
which is independent of postsynaptic potential 15 . This increase is not permanent 
and the synaptic strength subsequently returns to its baseline value. 

Generation of Physiological Responses. Neurons in the model are represented 
as first-order "leaky" integrators with multiple, time- varying inputs [1.0]. During 
simulation runs, membrane potentials and currents as well as the time of 
occurence of action potentials are stored for comparison with actual data. An 
explicit compartmental model (5 compartments) of the pyramidal cells is used to 
generate the spatial current distributions used for calculation of field potentials 
(evoked potentials, EEGs) [3.0, 4.0]. 

Stimulus Characteristics. To compare the responses of the model to those of 
the actual cortex, we mimicked actual experimental stimulation protocols in the 
simulated cortex and contrasted the resulting intracellular and extracellular 
records. For example, shock stimuli applied to the LOT are often used to elicit 
characteristic cortical evoked potentials in vivo 16,17,18 . In the model we simulated 
this stimulus paradigm by simultaneously activating all 100 input fibers. Another 
measure of cortical activity used most successfully by Freeman and colleagues 
involves recording EEG activity from piriform cortex in behaving animals 19,20 . 
These odor-like responses were generated in the model through steady, random 
stimulation of the input fibers. 

To study learning in the model, once physiological measures were established, 
it was required that we use more refined stimulation procedures. In the absence of 
any specific information about actual input activity patterns along the LOT, we 
constructed each stimulus out of a randomly selected set of 10 out of the 100 input 


fibers. Each stimulus episode consisted of a burst of activity in this subset of 
fibers with a duration of 10 msec at 25 msec intervals to simulate the 40 Hz peri- 
odicity of the actual olfactory bulb input. This pattern of activity was repeated in 
trials of 200 msec duration which roughly corresponds to the theta rhythm period- 
icity of bulbar activity and respiration 21 ' 22 . Each trial was then presented 5 times 
for a total exposure time of 1 second (cortical time). During this period the Hebb- 
type learning rule could be used to modify the connection weights in an activity- 
dependent fashion. 

Output Measure for Learning. Given that the sole output of the cortex is in the 
form of action potentials generated by the pyramidal cells, the output measure of 
the model was taken to be the vector of spike frequency for all pyramidal neurons 
over a 200 msec trial, with each element of the vector corresponding to the firing 
frequency of a single pyramidal cell. Figures 5 through 8 show the 10 by 10 array 
of pyramidal cells. The size of the box placed at each cell position represents the 
magnitude of the spike frequency for that cell. To evaluate learning effects, overlap 
comparisons between response pairs were made by taking the normalized dot 
product of their response vectors and expressing that value as a percent overlap 
(Fig. 4). 
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Actual 





Fig. 3. Simulated physiological responses of the model compared with actual cortical respons- 
es. Upper Simulated intracellular response of a single cell to paired stimulation of the input 
system (LOT) (left) compared with actual response (right) (Haberly & Bower,’84). Middle: 
Simulated extracellular response recorded at the cortical surface to stimulation of the LOT 
(left), compared with actual response (right) (Haberly,’73b). Lower Sbmulated EEG 
response reconed at the conical surface to odor-like input (left), for actual EEG see Freeman 
1978. 


Computational Requirements. All simulations were carried out on a Sun 
Microsystems 3/260 model microcomputer equipped with 8 Mbytes of memory and 
a floating point accelerator. Average time for a 200 msec simulation was 3 cpu 
minutes. 


RESULTS 


Physiological Responses 


As described above, our initial modeling objective was to accurately simulate 
a wide range of activity patterns recorded, by ourselves and others, in piriform 
cortex using various physiological procedures. Comparisons between actual and 
simulated records for several types of response are shown in figure 3. In general, 
the model replicated known physiological responses quite well (Wilson et al in 
preparation describes, in detail, the analysis of the physiological results). For 
example in response to shock stimulation of the input pathway (LOT), the model 
reproduces the principle characteristics of both the intracellular and location- 
dependent extracellular waveforms recorded in the actual cortex 9,17,18 (Fig. 3). 
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Fig. 4 . Convergence of the cortical response during training with a single stimulus with synaptic 
modification. 
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Fig. 5. Reconstruction of cortical response patterns with partially degraded stimuli. Left: 
Response, before training, to the full stimulus (left) and to the same stimulus with 50% of the 
input fibers inactivated (right). There is a 44% degradation in the response. Right: Response 
after training, to the full stimulus (left), and to the same stimulus with 50% of the input 
fibers inactivated (right). As a result of training, the degradation is now only 20%. 
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Fig. 6. Storage of multiple patterns. Left: Response to stimulus A after training. Middle: 
Response to stimulus B after training on A followed by training on B. Right: Response to 
stimulus A after training on A followed by training on B. When compared with the original 
response (left) there is an 85% congruence. 

Further, in response to odor-like stimulation the model exhibits 40 Hz oscillations 
which are characteristic of the EEG activity in olfactory cortex in awake, behaving 
animals 19 . Although beyond the scope of the present paper, the simulation also 
duplicates epileptiform 9 and damped oscillatory 16 type activity seen in the cortex 
under special stimulus or pharmacological conditions 4 . 

Learning 

Having simulated characteristic physiological responses, we wished to 
explore the capabilities of the model to store and recall information. Learning in 
this case is defined as the development of a consistent representation in the activ- 
ity of the cortex for a particular input pattern with repeated stimulation and synap- 
tic modification. Figure 4 shows how the network converges, with training, on a 
representation for a stimulus. Having demonstrated that, we studied three proper- 
ties of learned responses - the reconstruction of trained cortical response patterns 
with partially degraded stimuli, the simultaneous storage of separate stimulus 
response patterns, and the modulation of cortical response patterns independent 
of relative stimulus characteristics. 

Reconstruction of Learned Cortical Response Patterns with Partially Degrad- 
ed Stimuli. We were interested in knowing what effect training would have on the 
sensitivity of cortical responses to fluctuations in the input signal. First we pre- 
sented the model with a random stimulus A for one trial (without synaptic modifi- 
cation). On the next trial the model was presented with a degraded version of A 
in which half of the original 10 input fibers were inactivated. Comparison of the 
responses to these two stimuli in the naive cortex showed a 44% variation. Next, 
the model was trained on the full stimulus A for 1 second (with synaptic modifica- 
tion). Again, half of the input was removed and the model was presented with the 
degraded stimulus for 1 trial (without synaptic modification). In this case the dif- 
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Fig. 7. Results of merging cortical response patterns for dissimilar stimuli. Left Response to 
stimulus A and stimulus B before training. Stimuli A and B do not activate any input fibers in 
common but still have a 27% overlap in cortical response patterns. Right Response to stimu- 
lus A and stimulus B after training in the presence of a common modulatory input El. The 
overlap in cortical response patterns is now 46%. 

ference between cortical responses was only 20% (Fig. 5) showing that training 
increased the robustness of the response to degradation of the stimulus. 

Storage of Two Patterns. The model was first trained on a random stimulus A 
for 1 second. The response vector for this case was saved. Then, continuing with 
the weights obtained during this training, the model was trained on a new non- 
overlapping (i.e. different input fibers activated) stimulus B. Both stimulus A and 
stimulus B alone activated roughly 25% of the cortical pyramidal neurons with 25% 
overlap between the two responses. Following the second training period we 
assessed the amount of interference in recalling A introduced by training with B 
by presenting stimulus A again for a single trial (without synaptic modification). 
The variation between the response to A following additional training with B and 
the initially saved reponse to A alone was less than 15% (Fig. 6) demonstrating 
that learning B did not substantially interfere with the ability to recall A. 

Modulation of Cortical Response Patterns. It has been previously demon- 
strated that the stimulus evoked response of olfactory cortex can be modulated by 
factors not directly tied to stimulus qualities, such as the behavioral state of the 
animal 1 - 20,23 . Accordingly we were interested in knowing whether the representa- 
tions stored in the model could be modulated by the influence of such a "state" 
input. 

One potential role of a "state" input might be to merge the cortical response 
patterns for dissimilar stimuli; an effect we refer to as accomodation. To test this 
in the model, we presented it with a random input stimulus A for 1 trial. It was 
then presented with a random input stimulus B (non-overlapping input fibers). 
The amount of overlap in the cortical responses for these untrained cases was 
27%. Next, the model was trained for 1 second on stimulus A in the presence of an 
additional random "state" stimulus El (activity in a set of 10 input fibers distinct 
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Fig. 8. Results of differentiating cortical response patterns for similar stimuli. Left; 
Response to stimulus A and stimulus B before training. Stimuli A and B activate 75% of 
their input fibers in common and have a 77% overlap in cortical response patterns. Right: 
Response to stimulus A and stimulus B after training A in the presence of modulatory input 
El and training B with a different modulatory input E 2. The overlap in conical response pat- 
terns is now 45%. 


from both A and B). The model was then trained on stimulus B in the presence of 
the same "state" stimulus El. After training, the model was presented with stim- 
ulus A alone for 1 trial and stimulus B alone for 1 trial. Results showed that now, 
even without the coincident El input, the amount of overlap between A and B 
responses was found to have increased to 46% (Fig 7). The role of El in this case 
was to provide a common stimulus component during learning which reinforced 
shared components of the responses to input stimuli A and B. 

To test the ability of a state stimulus to induce differentiation of cortical 
response patterns for similar stimuli, we presented the model with a random input 
stimulus A for 1 trial, followed by 1 trial of a random input stimulus B (75% of the 
input fibers overlapping). The amount of overlap in the cortical responses for these 
untrained cases was 77%. Next, the model was trained for a period of 1 second on 
stimulus A in the presence of an additional random "state" stimulus El (a set of 
10 input fibers not overlapping either A or B). It was then trained on input stimu- 
lus B in the presence of a different random "state" stimulus E2 (10 input fibers not 
overlapping either A, B, or El) After this training the model was presented with 
stimulus A alone for 1 trial and stimulus B alone for 1 trial. The amount of overlap 
was found to have decreased to 45% (Fig 8). In this situation El and E2 provided 
a differential signal during learning which reinforced distinct components of the 
responses to input stimuli A and B. 

DISCUSSION 

Physiological Responses. Detailed discussion of the mechanisms underlying 
the simulated patterns of physiological activity in the cortex is beyond the scope 
of the current paper. However, the model has been of value in suggesting roles for 




specific features of the cortex in generating physiologically recorded activity. For 
example, while actual input to the cortex from the olfactory bulb is modulated into 
40 Hz bursts 24 , continuous stimulation of the model allowed us to demonstrate 
the model’s capability for intrinsic periodic activity independent of the comple- 
mentary pattern of stimulation from the olfactory bulb. While a similar ability has 
also been demonstrated by models of Freeman 25 , by studying this oscillating 
property in the model we were able to associate these oscillatory characteristics 
with specific interactions of local and distant network properties (e.g. inhibitory 
and excitatory time constants and trans -cortical axonal conduction velocities). 
This result suggests underlying mechanisms for these oscillatory patterns which 
may be somewhat different than those previously proposed. 

Learning. The main subject of this paper is the examination of the learning 
capabilities of the cortical model. In this model, the apparently sparse, highly dis- 
tributed pattern of connectivity characteristic of piriform cortex is fundamental to 
the way in which the model leams. Essentially, the highly distributed pattern of 
connections allows the model to develop stimulus-specific cortical response pat- 
terns by extracting correlations from randomly distributed input and association 
fiber activity. These correlations are, in effect, stored in the synaptic weights of 
the association fiber and local inhibitory connections. 

The model has also demonstrated robustness of a learned cortical response 
against degradation of the input signal. A key to this property is the action of 
sparsely distributed association fibers which provide reinforcment for previously 
established patterns of cortical activity. This property arises from the modification 
of synaptic weights due to correlations in activity between in tra -cortical associa- 
tion fibers. As a result of this modification the activity of a subset of pyramidal 
neurons driven by a degraded input drives the remaining neurons in the response. 

In general, in the model, similar stimuli will map onto similar cortical respons- 
es and dissimilar stimuli will map onto dissimilar cortical responses. However, a 
presumably important function of the cortex is not simply to store sensory infor- 
mation, but to represent incoming stimuli as a function of the absolute stimulus 
qualities and the context in which the stimulus occurs. The fact that many of the 
structures that piriform cortex projects to (and receives projections from) may be 
involved in multimodal "state" generation 14 is circumstantial evidence that such 
modulation may occur. We have demonstrated in the model that such a modulato- 
ry input can modify the representations generated by pairs of stimuli so as to 
push the representations of like stimuli apart and pull the representations of dis- 
similar stimuli together. It should be pointed out that this modulatory input was 
not an "instructive" signal which explicitly directed the course of the representa- 
tion, but rather a "state" signal which did not require a priori knowledge of the 
representational structure. In the model, this modulatory phenomenon is a simple 
consequence of the degree of overlap in the combined (odor stimulus + modulator) 
stimulus. Both cases approached approximately 50% overlap in cortical responses 
reflecting the approximately 50% overlap in the combined stimuli for both cases. 


Of interest was the use of the model’s reconstructive capabilities to maintain the 
modulated response to each input stimulus even in the absence of the modulatory 
input. 

CAVEATS AND CONCLUSIONS 

Our approach to studying this system involves using computer simulation to 
investigate mechanisms of information processing which could be implemented 
given what is known about biological constraints. The significance of results pre- 
sented here lies primarily in the finding that the structure of the model and the 
parameter settings which were appropriate for the reproduction of physiological 
responses were also appropriate for the proper convergence of a simple, biologi- 
cally plausible learning rule under various conditions. Of course, the model we 
have developed is only an approximation to the actual cortex limited by our knowl- 
edge of its organization and the computing power available. For example, the 
actual piriform cortex of the rat contains on the order of 10 6 cells (compared with 
10 2 in the simulations) with a sparsity of connection on the order of p=0.001 
(compared with p=0.05 in the simulations). Our continuing research effort will 
include explorations of the scaling properties of the network. 

Other assumptions made in the context of the current model include the 
assumption that the representation of information in piriform cortex is in the form 
of spatial distributions of rate-coded outputs. Information contained in the spatio- 
temporal patterns of activity was not analyzed, although preliminary observation 
suggests that this may be of significance. In fact, the dynamics of the model itself 
suggest that temporally encoded information in the input at various time scales 
may be resolvable by the cortex. Additionally, the output of the cortex was 
assumed to have spatial uniformity, i.e. no differential weighting of information 
was made on the basis of spatial location in the cortex. But again, observation of 
the dynamics of the model, as well as the details of known anatomical distribution 
patterns for axonal connections, indicate that this is a major oversimplification. 
Preliminary evidence from the model would indicate that some form of hierarchical 
structuring of information along rostral/caudal lines may occur. For example it 
may be that cells found in progressively more rostral locations would have 
increasingly non-specific odor responses. 

Further investigations of learning within the model will explore each of these 
issues more fully, with attempts to correlate simulated findings with actual record- 
ings from awake, behaving animals. At the same time, new data pertaining to the 
structure of the cortex will be incorporated into the model as it emerges. 
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n c«itj “ number of cells in the simulation 

As ■ distance between adjacent cells 

d k « duration of conductance change due to input type k 

v, ■ velocity of signals for input type k 

£4 - latency for input type k 

p, « spatial attenuation factor for input type k 

p~" • minimum spatial attenuabon for input type k 

Ar, « refractory period 


Tj ■ threshold for cell j 
L.j - distance from cell i to cell j 
A , a distribution of synaptic density for input type k 
« synaptic weight from cell j to cell i 
gal') “ conductance due to input type k in cell i 
f t (r) a conductance waveform for input type k 
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n ctol « number of different channels per segment 
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r' m membrane resistance for segment n 
g^(/) « conductance of channel c in segment n 
E c « equilibrium potential associated with channel c 
“ »xii] current between segment n±l and n 


I* (i ) ■ membrane current for segment n 
/„ » length of segment n 
<f„ - diameter of segment n 
/?„ « membrane resistivity 
)?, « intracellular resistivity per unit length 
- extracellular resistance per unit length 
C m - capacitance per unit surface area 
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ABSTRACT 

Much experimental study of real neural networks relies on the proper classification of 
extracellulary sampled neural signals (i.e. action potentials) recorded from the brains of ex- 
perimental animals. In most neurophysiology laboratories this classification task is simplified 
by limiting investigations to single, electrically well-isolated neurons recorded one at a time. 
However, for those interested in sampling the activities of many single neurons simultaneously, 
waveform classification becomes a serious concern. In this paper we describe and constrast 
three approaches to this problem each designed not only to recognize isolated neural events, 
but also to separately classify temporally overlapping events in real time. First we present two 
formulations of waveform classification using a neural network template matching approach. 
These two formulations are then compared to a simple template matching implementation. 
Analysis with real neural signals reveals that simple template matching is a better solution to 
this problem than either neural network approach. 

INTRODUCTION 

For many years, neurobiologists have been studying the nervous system by 
using single electrodes to serially sample the electrical activity of single neu- 
rons in the brain. However, as physiologists and theorists have become more 
aware of the complex, nonlinear dynamics of these networks, it has become 
apparent that serial sampling strategies may not provide all the information 
necessary to understand functional organization. In addition, it will likely be 
necessary to develop new techniques which sample the activities of multiple 

neurons simultaneously*. Over the last several years, we have developed two 
different methods to acquire multineuron data. Our initial design involved 
the placement of many tiny microelectrodes individually in a tightly packed 

pseudo-floating configuration within the brain^. More recently we have been 
developing a more sophisticated approach which utilizes recent advances in 
silicon technology to fabricate multi-ported silicon based electrodes (Fig. l). 
Using these electrodes we expect to be able to readily record the activity pat- 
terns of larger number of neurons. 

As research in multi-single neuron recording techniques continue, it has be- 
come very clear that whatever technique is used to acquire neural signals from 
many brain locations, the technical difficulties associated with sampling, data 
compressing, storing, analyzing and interpreting these signals largely dwarf the 
development of the sampling device itself. In this report we specifically consider 
the need to assure that neural action potentials (also known as “spikes”) on 
each of many parallel recording channels are correctly classified, which is just 
one aspect of the problem of post-processing multi-single neuron data. With 
more traditional single electrode/single neuron recordings, this task usually in- 


volves passing analog signals through a Schmidt trigger whose output indicates 
the occurence of an event to a computer, at the same time as it triggers an 
oscilloscope sweep of the analog data. The experimenter visually monitors the 
oscilloscope to verify the accuracy of the discrimination as a well-discriminated 
signal from a single neuron will overlap on successive oscilloscope traces (Fig. 
lc). Obviously this approach is impractical when large numbers of channels 
are recorded at the same time. Instead, it is necessary to automate this classifi- 
cation procedure. In this paper we will describe and contrast three approaches 
we have developed to do this. 



Fig. 1. Silicon probe being developed in our lababorutory for multi-single unit recording 
in cerebellar cortex, a) a complete probe; b) surface view of one recording tip; c) several 
superimposed neuronal action potentials recorded from such a silicon electrode in cerebellar 
cortex. 

W hile our principal design objective is the assurance that neural waveforms 
are adequately discriminated on multiple channels, technically the overall ob- 
jective of this research project is to sample from as many single neurons as 
possible. Therefore, it is a natural extention of our effort to develop a neural 
waveform classification scheme robust enough to allow us to distinguish activi- 
ties arising from more than one neuron per recording site. To do this, however, 
we now not only have to determine that a particular signal is neural in origin, 
but also from which of several possible neurons it arose (see Fig. 2a). While 
in general signals from different neurons have different waveforms aiding in 
the classification, neurons recorded on the same channel firing simultaneously 
or nearly simultaneously will produce novel combination waveforms (Fig. 2b) 
which also need 1o be classified. It is this last complication which particularly 




bedevils previous efforts to classify neural signals (For review see 5, also see 

3-4). In summary, then, our objective was to design a circuit that would: 

1. distinguish different waveforms even though neuronal discharges tend 
to be quite similar in shape (Fig. 2a); 

2. recognize the same waveform even though unavoidable movements 
such as animal respiration often result in periodic changes in the amplitude 
of a recorded signal by moving the brain relative to the tip of the electrode; 

3. be considerably robust to recording noise which variably corrupts all 
neural recordings (Fig. 2); 

4. resolve overlapping waveforms, which are likely to be particularly in- 
teresting events from a neurobiological point of view; 

5. provide real-time performance allowing the experimenter to detect 
problems with discrimination and monitor the progress of the experiment; 

6. be implementable in hardware due to the need to classify neural sig- 
nals on many channels simultaneously. Simply duplicating a software-based 
algorithm for each channel will not work, but rather, multiple, small, in- 
dependent, and programmable hardware devices need to be constructed. 




Fig. 2. a) Schematic diagram of an electrode recording from two neuronal cell bodies b) An 
actual multi-neuron recording. Note the similarities in the two waveforms and the overlapping 
event, c) and d) Synthesized data with different noise levels for testing classification algorithms 
(c: 0.3 NSR ; d: 1.1 NSR). 


METHODS 


The problem of detecting and classifying multiple neural signals on sin- 
gle voltage records involves two steps. First, the waveforms that are present 
in a particular signal must be identified and the templates be generated; 
second, these waveforms must be detected and classified in ongoing data 
records. To accomplish the first step we have modified the principal com- 
ponent analysis procedure described by Abeles and Goldstein^ to automat- 
ically extract templates of the distinct waveforms found in an initial sam- 
ple of the digitized analog data. This will not be discussed further as it is 
the means of accomplishing the second step which concerns us here. Specif- 
ically, in this paper we compare three new approaches to ongoing wave- 
form classification which deal explicitly with overlapping spikes and vari- 
ably meet other design criteria outlined above. These approaches consist of 
a modified template matching scheme, and two applied neural network im- 
plementations. We will first consider the neural network approaches. On 
a point of nomenclature, to avoid confusion in what follows, the real neu- 
rons whose signals we want to classify will be referred to as “neurons” while 
computing elements in the applied neural networks will be called “Hopons.” 

Neural Network Approach — Overall, the problem of classifying neural 
waveforms can best be seen as an optimization problem in the presence of 
noise. Much recent work on neural-type network algorithms has demonstrated 

that these networks work quite well on problems of this sort^"®. In particular, 
in a recent paper Hopfield and Tank describe an A/D converter network and 

suggest how to map the problem of template matching into a similar context^. 
The energy functional for the network they propose has the form: 

^'yEEWrE^ (i) 

» 3 «' 

where T t; = connectivity between Hopon x and Hopon j, V, = voltage output 
of Hopon x, 1, = input current to Hopon x and each Hopon has a sigmoid 
input-output characteristic V = g(u) = 1/(1 + exp(-au)). 

If the equation of motion is set to be: 


dujdt = -dE/dV = T t] Vj + 7, (la) 

3 

then we see that dE/dt = —(jZjT, jV } + 7,)dV /dt = - [du / dt) (dV / dt) = 

— g' (u) (du / dt) 2 < 0. Hence E will go to to a minimum which, in a network 
constructed as described below, will correspond to a proposed solution to a 
particular waveform classification problem. 

Template Matching using a Hopfield-type Neural Net — We have 
taken the following approach to template matching using a neural network. For 
simplicity, we initially restricted the classification problem to one involving two 
waveforms and have accordingly constructed a neural network made up of two 
groups of Hopons, each concerned with discriminating one or the other wave- 
form. The classification procedure works as follows: first, a Schmidt trigger 


is used to detect the presence of a voltage on the signal channel above a set 
threshold. When this threshold is crossed, implying the presence of a possible 
neural signal, 2 msecs of data around the crossing are stored in a buffer (40 
samples at 20 KHz). Note that biophysical limitations assure that a single real 
neuron cannot discharge more than once in this time period, so only one wave- 
form of a particular type can occur in this data sample. Also, action potentials 
are of the order of 1 msec in duration, so the 2 msec window will include the full 
signal for single or overlapped waveforms. In the next step (explained later) 
the data values are correlated and passed into a Hopfield network designed to 
minimize the mean-square error between the actual data and the linear com- 
bination of different delays of the templates. Each Hopon in the set of Hopons 
concerned with one waveform represents a particular temporal delay in the 
occurrence of that waveform in the buffer. To express the network in terms of 
an energy function formulation: Let x(t) = input waveform amplitude in the 
t ih time bin, sy(<) = amplitude of the j th template, V ik denote if Sj(t — k)[j th 
template delayed by k time bins)is present in the input waveform. Then the 
appropriate energy function is: 

z ‘ i.k 

~ \ E ViMk - - k) ( 2 ) 

£ tj.k 

+ i E 

i.k 1<*3 

The first term is designed to minimize the mean-square error and specifies 
the best match. Since V £ [0, l], the second term is minimized only when each 
V jk assumes values 0 or 1. It also sets the diagonal elements T tJ to 0. The 
third term creates mutual inhibition among the processing nodes evaluating 
the same neuronal signal, which as described above can only occur once per 
sample. 

Expanding and simplifying expression (2), the connection matrix is: 


^(ii .*i ).(>=.*a) 


| - E {t ~ ki)s j3 (t - k 2 ) - i6 jlh 

l 0 if j\ = j 2 , k\ = k 2 


and the input current 

J jk = L z (0 - *) - \ - k ) ( 36 ) 

t z t 

As it can be seen, the inputs are the correlations between the actual data and 
the various delays of the templates subtracting a constant term. 

Modified Hopfield Network — As documented in more detail in Fig. 
3-4, the above full Hopfield-type network works well for temporally isolated 
spikes at moderate noise levels, but for overlapping spikes it has a local minima 
problem. This is more severe with more than two waveforms in the network. 


Further, we need to build our network in hardware and the full Hopfield net- 
work is difficult to implement with current technology (see below). For these 
reasons, we developed a modified neural network approach which significantly 
reduces the necessary hardware complexity and also has improved performance. 
To understand how this works, let us look at the information contained in the 
quantities T t] and I,j (eq. 3a and 36 ) and make some use of them. These 
quantities have to be calculated at a pre-processing stage before being loaded 
into the Hopfield network. If after calculating these quantities, we can quickly 
rule out a large number of possible template combinations, then we can sig- 
nificantly reduce the size of the problem and thus use a much smaller (and 
hence more efficient) neural network to find the optimal solution. To make the 
derivation simple, we define slightly modified versions of and I tJ (eq. 4a 
and 46) for two-template case. 

T *i = H 5 i(* “ 0 5 2(* - j) (4a) 

t 

I; = £*(<)[^.(< - ■') + - J)] - \ I>i(f - 0 - \ E4(< - J) (it) 

In the case of overlaping spikes the T, ; ’s are the cross-correlations between Si(f) 
and s 2 (<) with different delays and 7,,’s are the cross-correlations between input 
x(t) and weighted combination of 5j(<) and s 2 (t). Now if x(t) = sj(f — :) -f 
s 2 (t ~ j) ( ie - the overlap of the first template with i time bin delay and the 
second template with j time bin delay), then A t; = |T, ; — 7,y| = 0. However 
in the presence of noise, A 1; will not be identically zero, but will equal to the 
noise, and if A |; > A T t; (where A T X] = \T,j - T,'j>\ for t # i' and j = j') this 
simple algorithm may make unacceptable errors. A solution to this problem 
for overlapping spikes will be described below, but now let us consider the 
problem of classifying non-overlapping spikes. In this case, we can compare 
the input cross-correlation with the auto-correlations (eq. 4c and 4 d). 

r,’ = L4(‘-0; tt = 2M(‘-0 M 

i t 

K = H x (0*i(* - 0; 1" = H - 0 (4d) 

t t 

So for non-overlapping cases, if i(t) = Sj(< — :), then A( = \T’ X — I[\ = 0. If 
*(0 = M*-0,then A| ( = \T” - 7,"| = 0. 

In the absence of noise, then the minimum of A u , A[ and A" represents the 
correct classification. However, in the presence of noise, none of these quantities 
will be identically zero, but will equal the noise in the input x(t) which will 
give rise to unacceptible errors. Our solution to this noise related problem is 
to choose a few minima (three have chosen in our case) instead of one. For 
each minimum there is either a known corresponding linear combination of 
templates for overlapping cases or a simple template for non-overlapping cases. 
A three neuron Hopfield-typc network is then programmed so that each neuron 
corresponds to each of the cases. The input i(/) is fed to this tiny network to 
resolve whatever confusion remains after the first step of “cross-correlation” 
comparisons. (Note: Simple template matching as described below can also be 
used in the place of the tiny Hopfield type network.) 


Simple Template Matching — To evaluate the performances of these 
neural network approaches, we decided to implement a simple template match- 
ing scheme, which we will now describe. However, as documented below, this 
approach turned out to be the most accurate and require the least complex 
hardware of any of the three approaches. The first step is, again, to fill a buffer 
with data based on the detection of a possible neural signal. Then we calculate 
the difference between the recorded waveform and all possible combinations of 
the two previously identified templates. Formally, this consists of calculating 
the distances between the input x(m) and all possible cases generated by all 
the combinations of the two templates. 

Aj = J2 1*(0 - {$i(* - 0 + -;)}| 

t 

< = £ MO - *i(< - 01; <Z = E MO - *>« - 01 

t t 

d m in min(d,j , d t , d t ) 

dmin gives the best fit of all possible combinations of templates to the actual 
voltage signal. 


TESTING PROCEDURES 

To compare the performance of each of the three approaches, we devised a 
common set of test data using the following procedures. First, we used the prin- 
cipal component method of Abeles and Goldstein^ to generate two templates 
from a digitized analog record of neural activity recorded in the cerebellum 
of the rat. The two actual spike waveform templates we decided to use had 
a peak-to-peak ratio of 1.375. From a second set of analog recordings made 
from a site in the cerebellum in which no action potential events were evident, 
we determined the spectral characteristics of the recording noise. These two 
components derived from real neural recordings were then digitally combined, 
the objective being to construct realistic records, while also knowing absolutely 
what the correct solution to the template matching problem was for each oc- 
curring spike. As shown in Fig. 2c and 2d, data sets corresponding to different 
noise to signal ratios were constructed. We also carried out simulations with 
the amplitudes of the templates themselves varied in the synthesized records to 
simulate waveform changes due to brain movements often seen in real record- 
ings. In addition to two waveform test sets, we also constructed three waveform 
sets by generating a third template that was the average of the first two tem- 
plates. To further quantify the comparisons of the three diffferent approaches 
described above we considered non-overlapping and overlapping spikes sepa- 
rately. To quantify the performance of the three different approaches, two 
standards for classification were devised. In the first and hardest case, to be 
judged a correct classification, the precise order and timing of two waveforms 
had to be reconstructed. In the second and looser scheme, classification was 
judged correct if the order of two waveforms was correct but timing was al- 
lowed to vary by ±100 /isecs(i.e. ±2 time bins) which for most ncurobiological 
applications is probably sufficient resolution. Figs. 3-4 compare the perfor- 
mance results for the three approaches to waveform classification implemented 
as digital simulations. 


PERFORMANCE COMPARISON 


Two templates - non-overlapping waveforms: As shown in Fig. 3a, at 
low noise-to-signal ratios (NSRs below .2) each of the three approaches were 
comparable in performance reaching close to 100% accuracy for each criterion. 
As the ratio was increased, however the neural network implementations did 
less and less well with respect to the simple template matching algorithm with 
the full Hopfield type network doing considerably worse than the modified 
network. In the range of NSR most often found in real data (.2 - .4) simple 
template matching performed considerably better than either of the neural 
network approaches. Also it is to be noted that simple template matching 
gives an estimate of the goodness of fit betwwen the waveform and the closest 
template which could be used to identify events that should not be classified 
(e.g. signals due to noise). 

a. b. 



c. 




degrees of overlap 

light line — absolute criteria simple template matching 

heavy line — less stringent criteria Hopfield network 

modified Hopfield network 

Fig. 3. Comparisons of the three approaches detecting two non-overlapping (a), and over- 
lapping (b) waveforms, c) compares the performances of the neural network approaches for 
different degrees of waveform overlap. 

Two templates - overlapping waveforms: Fig. 3b and 3c compare perfor- 
mances when waveforms overlapped. In Fig. 3b the serious local minima prob- 
lem encountered in the full neural network is demonstrated as is the improved 
performance of the modified network. Again, overall performance in physi- 


ological ranges of noise is clearly best for simple template matching. When 
the noise level is low, the modified approach is the better of the two neural 
networks due to the reliability of the correlation number which reflects the 
resemblence between the input data and the template. When the noise level 
is high, errors in the correlation numbers may exclude the right combination 
from the smaller network. In this case its performance is actually a little worse 
than the larger Hopfield network. Fig. 3c documents in detail which degrees 
of overlap produce the most trouble for the neural network approaches at av- 
erage NSR levels found in real neural data. It can be seen that for the neural 
networks, the most serious problem is encountered when the delays between 
the two waveforms are small enough that the resulting waveform looks like the 
larger waveform with some perturbation. 

Three templates - overlapping and non-overlapping: In Fig. 4 are shown 
the comparisons between the full Hopfield network approach and the simple 
template matching approach. For nonoverlapping waveforms, the performance 
of these two approaches is much more comparable than for the two waveform 
case (Fig. 4a), although simple template matching is still the optimal method. 
In the overlapping waveform condition, however, the neural network approach 
fails badly (Fig. 4b and 4c). For this particular application and implementa- 
tion, the neural network approach does not scale well. 




c. 



Hopfield network 

simple template matching 

light line — absolute criteria 
heavy line — less stringent criteria 
o = variance of the noise 


Fig. 4. Comparisons of performance for three waveforms, a) nonoverlapping waveforms; b) 
two waveforms overlapping; c) three waveforms overlapping. 


HARDWARE COMPARISONS 


As described earlier, an important design requirement for this work was the 
ability to detect neural signals in analog records in real-time originating from 


many simultaneously active sampling electrodes. Because it is not feasible to 
run the algorithms in a computer in real time for all the channels simultane- 
ously, it is necessary to design and build dedicated hardware for each channel. 
To do this, we have decided to design VLSI implementations of our circuitry. 
In this regard, it is well recognized that large modifiable neural networks need 
very elaborate hardware implementations. Let us consider, for example, im- 
plementing hardwares for a two-template case for comparisons. Let n = no. 
of neurons per template (one neuron for each delay of the template), m = 
no. of iterations to reach the stable state (in simulating the discretized dif- 
ferential equation, with step size = 0.05), / = no. of samples in a template 
tj(m). Then, the number of connections in the full Hopfield network will be 
An 2 . The total no. of synaptic calculations = Amn 2 . So, for two templates 
and n = 16, m = 100, 4mn 2 = 102,400. Thus building the full Hopfield-type 
network digitally requires a system too large to be put in a single VLSI chip 
which will work in real time. If we want to build an analog system, we need 
to have many (0(4n 2 )) easily modifiable synapses. As yet this technology is 
not available for nets of this size. The modified Hopfield-type network on the 
other hand is less technically demanding. To do the preprocessing to obtain 
the minimum values we have to do about n 2 = 256 additions to find all possible 
IijS and require 256 subtractions and comparisons to find three minima. The 
costs associated with doing input cross-correlations are the same as for the full 
neural network (i.e. 2 nl = 768 (/ = 24) multiplications). The saving with the 
modified approach is that the network used is small and fast (120 multiplica- 
tions and 120 additions to construct the modifiable synapses, no. of synaptic 
calculations = 90 with m = 10 , n = 3). 

In contrast to the neural networks, simple template matching is simple 
indeed. For example, it must perform about n'l -f n* = 10,496 additions and 
n 2 = 256 comparisons to find the minimum d x} . Additions are considerably less 
costly in time and hardware than multiplications. In fact, because this method 
needs only addition operations, our preliminary design work suggests it can be 
built on a single chip and will be able to do the two-template classification 
in as little as 20 microseconds. This actually raises the possibility that with 
switching and buffering one chip might be able to service more than one channel 
in essentially real time. 


CONCLUSIONS 

Template matching using a full Hopfield-type neural network is found to 
be robust to noise and changes in signal waveform for the two neural waveform 
classification problem. However, for a three-waveform case, the network does 
not perform well. Further, the network requires many modifiable connections 
and therefore results in an elaborate hardware implementation. The overall 
performance of the modified neural network approach is better than the full 
Hopfield network approach. The computation has been reduced largly and 
the hardware requirements are considerably less demanding demonstrating the 
value of designing a specific network to a specified problem. However, even the 
modified neural network performs less well than a simple template-matching 
algorithm which also has the simplest hardware implementation. Using the 
simple template matching algorithm, our simulations suggest it will be pos- 
sible to build a two or three waveform classifier on a single VLSI chip using 
CMOS technology that works in real time with excellent error characteristics. 
Further, such a chip will be able to accurately classify variably overlapping 


neural signals. 
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IMPLEMENTATION OF PATTERN-RECOGNITION 
ALGORITHMS DERIVED FROM OLFACTORY INFORMATION PROCESSING 

Abstract 

Sensory and perceptual information exists as space-time patterns of neural 
activity in cortex in two modes. Neural analysis of sensory input, as in 
feature extraction, is done with action potentials of single neurons in 
point processes. Neural synthesis of input with past experience and expec- 
tancy of future action is done with dendritic integration in local mean 
fields. Both kinds of activity are found to coexist in olfactory and visual 
cortex, each preceding and then following the other. The transformation of 
information from the pulse mode to the dendritic mode involves a state 
transition of the cortical network that can be modeled by a Hopf bifurcation 
in both software and hardware embodiments. These models show robust powers 
for amplification and correct classification of noisy and incomplete 
patterns corresponding to sensory inputs to biological nervous systems in 
attentive and motivated animals. The evidence is reviewed and the 
requirements are summarized for machine simulations of these operations. 
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IMPLEMENTATION OF PATTERN RECOGNITION ALGORITHMS 


DERIVED FROM OLFACTORY INFORMATION PROCESSING 

SUMMARY 

1. Modes of information in cerebral cortex - 

point process: action potential frequency 

local mean field: dendritic potential amplitude 

2. Spatial amplitude modulation of carrier waves - 

olfactory bulb of rabbit 
primary visual cortex of monkey 

3. Implementation with high-dimensional nonlinear ODEs 

linear integration ' 

asymmetric sigmoid nonlinearity 

modifiable associational connections 

4. Comparison of software and hardware embodiments 

amplification and classification 


chaos and the tolerance of disorder 


MUCOSA 



The main cell types in the olfactory 
bulb are the mitral and granule cells. 

Mitral cells form a set of densely 
interconnected mutually excitatory 
cells. They also excite the granule 
cells. Mitral cell axons carry the 
output signal to the rest of the brain 


Granule cells form a set of densely 
interconnected mutually inhibitory neurons 
They also inhibit the mitral cells. 


These two cell types are connected 
in a negative feedback loop. They 
form a neural oscillator. The 
olfactory bulb consists of approx. 
2000 such coupled oscillators. 


Excitatory couplings provide modifiable 
synapses in learning and perception. 

In hibitory couplings provide stability 
and spatial contrast. 
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prepyrlform cortex 



The BULB GENERATES A BRIEF OSCILLATORY EG BURST WITH EACH INSPIRATION, 
WHETHER OR NOT A CONDITIONED STIMULUS ODOR IS PRESENTED. THIS RECORD- 
ING 19 FROM ONE TRIAL IN A WAKING RABBIT. DIE SAME PATTERN OF EG IS 
FOUND OVER THE WHOLE MAIN BULB. 
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The spatial pattern of root fean square (rms) amplitude from 64 epidural 

ELECTRODES (4 X 4 MM) CHANGES UNDER CONDITIONING AND RESTABILIZES AFTER 
EACH NEW TRAINING ODOR. 
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Discriminant analysis of the factor scores shows that 75% of bursts are 

CORRECTLY CLASSIFIED WITH 2 DISCRIMINANT FUNCTIONS. A PLOT IS SHOWN OF THE 
DISCRIMINANT SPACE FOR CNE RABBIT. 
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BASIC ELEMENTS-- 


1. Lin e a r integrator- 2nd order. 

2. St ati c sigmoid nonlinearity. 

3. Hebb connection & assembly. 

4. Parallel input & output. 

KEY PROPERTIES 1 

1. Chaotic basal st at e. 

2. Input-dependent gain. 

3. Bifurcation on input. 

4. Spatial pattern coding. 
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CORRELATION LEARNING RULE : a modified Hebb rule 


changing synapse in the following way: For the reduced interconnected K|| set 

if both channel j and k are on, Kee(j,k)= with 64-channel, the figure shows 100% 

Kee(high), otherwise, unchanged. clustering as well as perfect grouping. 

The output waveform in the 64-channel case 
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MULTI ELECTRODE BURST PATTERN FEATURE 
EXTRACTION FROM SMALL MAMMALIAN NETWORKS IN CULTURE 

Abstract 

We are investigating the properties of small (100 to 400 neuron), monolayer 
networks grown in culture from dissociated mouse spinal cord tissue on glass 
plates featuring 64 photoetched microelectrodes. These networks form vigor- 
ous organotypic activity that becomes organized with time. At 4 weeks after 
seeding, the cultures exhibit synchronized burst patterns on many elec- 
trodes. The spontaneous activity can be maintained for as long as 100 days 
in culture and is often rhythmic. Network d i s i nh i b i t ion via the inhibitory 
synapse blockers strychnine and bicuculline produces rapid, rhythmic burst- 
ing in all cultures with highly stereotyped spike patterns within the 
bursts, regardless of the nature of the prior spontaneous activity. Data 
are processed on two levels: (1) burst pattern analysis in terms of burst 
frequency, duration, and period, and (2) analysis of spike patterns within 
bursts. Data compression is achieved by burst integration which preserves 
the character of the spike patterns. Integrated bursts are being classified 
according to shape and identified with letters, allowing 2 hours of activity 
on one electrode to be condensed to one page of letter sequences. Pattern 
recognition and cross-correlations with other electrodes are therewith 
simplified. In view of the fact that all synapses integrate spike trains, 
the ignoring of detailed spike information is reasonable and makes real- 
time, statistical analysis of compressed multichannel data possible. 
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A HYBRID CONNECT I ON I ST/A I ARCHITECTURE FOR 
REFLEXIVE, REFLECTIVE, AND EXPLORATORY SYSTEMS 

Abstract 

Implementing autonomous systems capable of operation in both familiar and 
unfamiliar environments is a goal being hotly pursued in many communities, 
both military and commercial. This presentation discusses an autonomous 
system philosophy and architecture which combines both trainable and non- 
trainable artificial neural network elements for both mapping and dynamic 
system simulation, with rule-based decision-directed structures. The archi- 
tecture presented should manifest interesting behavior characteristics, 
including the ability to handle an unbounded dimensional environment, the 
ability to provide rapid reflex response as well to enable reflecting and 
planning actions, and the ability to execute generic exploratory behaviors 
designed to enhance knowledge of the environment when it is ambiguous. A 
methodology for embedding both discrete rules and apprentice learning in the 
system is used to initialize the autonomous system. It is hoped that this 
presentation will help air some of the issues faced by neural network 
designers attempting to merge connect i on i st and rule-based technologies. 
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FLOPS: A PARALLEL-RULE-FIRING FUZZY EXPERT SYSTEM SHELL 

Abstract 

The use of fuzzy systems theory as a basis for expert systems is reviewed 
with particular reference to a fuzzy expert system having rules that are 
fired in parallel. Examples are given of fuzzy sets, fuzzy numbers, and 
fuzzy logic, and their use in pattern recognition and process control. 
Fuzzy systems theory may be looked upon as furnishing ways of processing 
information which is uncertain, imprecise, vague, ambiguous, or contra- 
dictory. This paper is concentrated on the use of fuzzy systems theory in 
processing information which is ambiguous or contradictory, rather than 
uncertain, vague, or imprecise. We also show how advantages can be reaped 
from the intentional introduction of ambiguities in description, even in a 
field as objective as process control. 


FLOPS - A GENERAL-PURPOSE 
FUZZY EXPERT SYSTEM SHELL 
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PROCEDURAL LANGUAGES (FORTRAN, PASCAL C) 

and 

NON- PROCEDURAL LANGUAGES (EXPERT PRODUCTION SYSTEMS) 

Advantages and Disadvantages for Problem Solving 



PROCEDURAL 

NON-PROCEDURAL 

Speed 

Fast 

Very Slow 

Numerical 

Computation 

Very Good 

Very Bad 

Symbolic 

Computation 

(Reasoning) 

Poor 

Excellent 


Best power obtained from a mix of procedural and non- 
procedural languages 


• • 


BLACKBOARD EXPERT SYSTEM FRAMEWORK 


SYSTEM SUPERVISOR 
AND SCHEDULER 


EXPERT 

SUBSYSTEM 


PROCEDURAL 

PROGRAM 


PROCEDURAL 

PROGRAM 


DATA COMMUNICATION AREA 


BLACKBOARD SYSTEM PERMITS COMBINING ADVANTAGES OF 
BOTH PROCEDURAL AND NON-PROCEDURAL LANGUAGES. 


REQUIREMENTS : 

Expert system should be able to call programs or 
systems in any other language; control should revert to 
expert system on program or system termination. 

A flexible common framework should be chosen for data, 
so the programs can communicate effectively and 
conveniently with each other. 

Few expert system shells meet these requirements . 




HANDLING UNCERTAINTIES 


Candidate Methods: 

Ad hoc methods (e.g. Mycin, Ml) 

Bayes ' Theorem (e.g. Prospector derivatives) 
Dempster-Shaf er Theory 
Fuzzy Systems Theory 

HANDLING AMBIGUITIES: 

An ambiguity: the situation when more than one of 
several possibilities might be true . 

A Contradiction: only one of several possibilities is 
true , but we don't know which one . 

Candidate Methods: 

Ad hoc methods (virtually non-existent) 

Probability distributions (theoretically possible, but 
very awkward and seldom if ever done) 


Fuzzy Systems Theory (extremely easy) 


EXAMPLES OF A FUZZY SET 


ATHLETES IN MASH 4077 


Member 

Lt. Col. Penobscot 
Father Mulcahey 
Major Houlihan 
B. J. Honeycutt 
Klinger 
Hawkeye Pierce 


Grade of Membership 

1.0 

0.8 

' 0.6 
0.5 
0.4 
0.15 


OBJECTS ON A RADAR SCREEN 


Member 

Commercial Airliner 
Military Jet 
Light Plane 
Powered Ultralight 
Hang Glider 
Santa Claus 


Grade of Membership 

0.02 

0.01 

0.13 

0.44 

0.63 

0.99 


EXAMPLE OF A FUZZY NUMBER: A FUZZY TWO 


Grade of 
Membership 


1.0 + 


0.5 + 


★ 


★ * 


* * 


* * 


* 


* 


* 


* 


* 


★ 


★ 


* 


* 


★ 


★ 


* 


o # o *********** I ************ 

0 12 3 4 


Member 


EXAMPLE OF FUZZY LOGIC: 


My talk will be a success if the material is 
interesting, the visual material good and the audience 
is really interested or if the talk is given by a very 
exciting speaker* 

Rule 1: ( ( Material = interesting 

AND 

visuals = good 
AND 

audience = interested ) 

OR 

( speaker = exciting ) 

— > 

talk = success 

Confidence that (material = interesting) = 0.75 
Confidence that (visuals = good) = 0.6 
Confidence that (audience = interested) = 0.88 
Confidence that (speaker = exciting) = 0.33 

Combined confidence: 

first clause, min(0.75, 0.6, 0.88) = 0.6 
second clause =0.33 

first OR second clause, max(0.33, 0.60) = 0.60 

(AND Rule: A chain is no stronger than its weakest 
link. ) 


APPLICATION: PATTERN RECOGNITION 
General Scheme: 

1. Feature extraction by procedural language 
programs . 

2. Assign word descriptors to features using fuzzy 
sets to handle ambiguities. 

3. Assign preliminary classifications using fuzzy 
sets to handle contradictions. 

4. Resolve contradictory preliminary 
classifications to obtain final classifications. 

Usually means pulling in additional information as 
required. 


ASSIGNING WORD DESCRIPTORS TO NUMERIC FEATURES: 


AREA of image region to be classified = a numeric 
feature 

SIZE of image region is a fuzzy set of word 
descriptors : 

SIZE = {TEENY SMALL MEDIUM LARGE HUGE} 

RULE (in English): if for any region the AREA is 
approximately less than or equal to a fuzzy 100 plus or 
minus 50 then the SIZE is TEENY. 

In FLOPS: 

rule ( region ''area ~<= 100,50,0 ) 

— > modify 1 '‘size . TEENY ; 

Other descriptors: 

xbar = numeric feature, 

xpos = {FAR_LEFT LEFT CENTER RIGHT FAR_RIGHT} = fuzzy 
set 

ybar = numeric feature, 

ypos = {HIGHEST HIGH MIDDLE LOW LOWEST} =fuzzy set 


CLASSIFICATION RULES: 


Classification fuzzy set used in echocardiogram 
classification: 

class = {LV RV LA RA LV+LA RV+RA ARTIFACT PAPILLARY 

...} 

RULE (in English) If in any region the size is SMALL 
and the x-position is CENTER and the y-position is 
HIGHEST then it is likely to be an ARTIFACT. 

In FLOPS: 

rule ( region ~size . SMALL ~xpos. CENTER 

~ypos .HIGHEST ) 

— > 

modify 1 ^class .ARTIFACT ; 


MATCHING OBSERVED PATTERN AGAINST LIBRARY PATTERN 


For illustration, we match only one fuzzy set, that for 
size. In general, more than one fuzzy set would be 
simultaneously matched. 

Fuzzy Set Size: 



Observed 

Pattern__l 

Observed 

AND 

Pattern_l 

Member 

Grade-Of- 

Grade-Of- 

Grade-Of- 


Membership 

Membership 

Membership 

Very Large 

0 

0.15 

0 

Large 

0.04 

1.00 

0.04 

Medium 

1.00 

0.45 

0.45 

Small 

0.65 

0 

0 

Very Small 

0 

0 

0 

Confidence 

in match = 

match on Very 

Large OR Large 

• • • • 

= max ( 0 , 0 . 

O 

*» 

in 

• 

o 

o 

, 0) 



= 0.45 

= grade of membership of PATTERN_1 in fuzzy set of 
classifications . 

(We are on our way toward a simple and reliable 
pattern-matching technique, making use of the 
ambiguities in the word descriptor fuzzy sets.) 


FUZZY PROCESS CONTROL 


Especially useful when we do not have a mathematical model of the 
process . 

Block Diagram: 


Set Point 


v 


Process 

Output 
> 

— 

Error 
> 

A/D 

— > 

Computer 

A 






Control 

1 

Controller Output 

D/A 

< 

Program 









SIMPLE PROCESS CONTROL PROGRAM: 


1. Convert error, rate of change of error to fuzzy set 
of word descriptors such as NEGAT I VE_SMALL , 

POS ITIVE_LARGE (fuzzification). 

2. Use control rules to obtain fuzzy set of word 
descriptors for controller output. 

3. Convert fuzzy set for controller output to voltage 
(defuzzification) . 

Typical control rules: 

IF error is POSITIVE_SMALL and rate is ZERO then 
controller-output is NEGATIVE_SMALL . 

In FLOPS r 

rule ( process A error . P_SMALL ''rate. ZERO ) 

( controller ) 

— > 

modify 2 ^output . N SMALL ; 


PLOPS: A FUZZY EXPERT SYSTEM SHELL. 


Features : 

(1) Deductive reasoning is emulated by a 
conventional sequential-rule-firing mode; inductive 
reasoning is emulated by a unique parallel-rule-firing 
mode which in turn emulates a non-von-Neumann parallel 
computer. • 

(2) Data types include integers; floats; strings; 
fuzzy numbers; fuzzy sets and confidence levels. 

(3) Two external file types are provided: Type I, 
FLOPS programs and commands; and Type II, "flat file" 
relational data base format. 

(4) External programs written in any language may 
be called in the same manner as a DOS call: program 
name plus command string. With (3), provides a 
blackboard system. 

(5) A basic truth maintenance system is provided 
based on monotonic fuzzy logic; this may be overridden 
by the programmer to provide fully non-monotonic logic. 

(6) Backtracking is fully automatic in sequential 
mode. Since in parallel mode all fireable rules are 
fired concurrently, backtracking is irrelevant. 


SUMMARY 


(1) Fuzzy systems theory permits handling 
uncertainties, ambiguities and contradictions in a 
mathematically convenient and rigorous fashion. It may 
be used both in procedural and non-procedural 
languages. When employed in an expert system, a system 
shell should be written or selected which incorporates 
these basic features: 

Confidence factors for strings, floats and 

integers ; 

Discrete fuzzy sets; 

Fuzzy numbers; 

Approximate numerical comparison operators . 

(2) Although expert production systems are too slow to 
permit their unassisted use in most online 
applications, they may be used in conjunction with 
procedural language programs in a blackboard system to 
combine the reasoning skills of an expert system with 
the computational ability of procedural language 
programs . 

(3) While fuzzy techniques are very powerful, they are 
unfamiliar to most American engineers and scientists. 
Study and practice in their use is required. 


NOTES 


ORIGINAL PAGE 

BLACK AND WHITE PHOTOGRAPH 



63 

/s' 3 y S 

Af 9 1-71366 


Maria Zemankova, Ph.D. 

University of Tennessee 
Knoxville, Tennessee 


Dr. Zemankova received her B.S. degree from the American University in Cairo 
in 1977, majoring in mathematics and computer science with a minor in psy- 
chology. Her M.S. and Ph.D. degrees were in computer science from Florida 
State University in 1979 and 1983, respectively. In 1983, she worked on the 
design and implementation of a relational data base "team-up" at Unlimited 
Processing, Inc., in Jacksonville, Florida. Since 1 98 A , Dr. Zemankova has 
been an assistant professor in the Department of Computer Science at the 
University of Tennessee, Knoxville, and is also engaged in research collabo- 
ration with the Oak Ridge National Laboratory. She spent January to August 
1987 as a visiting research assistant professor in the Department of 
Computer Science, University of Illinois, Urbana. Her research interests 
include intelligent information systems, machine learning, reasoning under 
uncertainty, and knowledge representation. Dr. Zemankova has authored a book 
entitled "Fuzzy Relational Data Bases - a Key to Expert Systems" (1984), and 
several papers on fuzzy intelligent information systems, and on applications 
of fuzzy set and rough set theories to information systems design and 
machine learning. Her professional affiliations include AAI, ACM, IEEE, 
IFSA, NAFIPS, and Sigma xi, and she is an associate editor of the 
International Journal of Approximate Reasoning. 


INTELLIGENT INFORMATION SYSTEMS WITH LEARNING CAPABILITIES 

Abstract 

An intelligent information system is designed to derive information (that 
may not be explicitly stored in the data base) by application of rules for 
inferring plausible answers to queries. The system is divided into the 
knowledge base (KB) and the inference engine (IE). The KB can be further 
partitioned into a factual base (FB) and an explanatory base (EB). The FB 
is used for storing facts (data) that may be imprecise or incomplete, and 
the EB contains knowledge; i.e., flexible (fuzzy) concepts, relationships, 
or rules that are used to interpret the available data. The IE is designed 
to perform flexible reasoning. Clearly, the "intelligence" of the system 
depends on the knowledge available in the KB and the types of inferences 
that the IE is capable of performing. An experimental system (APPLAUSE) is 
discussed, together with demonstration of system function in the knowledge 
acquisition and querying modes, including its explanatory capabilities. 
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THEME 


• Make Machines Behave Intelligently 
Marks of intelligence: 

• Learning Capability 

• Reasoning with Insufficient, Unreliable or Imprecise Data 

• Reasoning Under Resource Constraints 


• Creativity- Discovery 


PLAUSIBLE REASONING 


• A Core theory- proposed by Collins and Michalski 

• Modifications and Implementation 

MODEL 

• Hierarchical Organization of Knowledge 

• Mechanism to Manipulate Incomplete and Uncertain 
Knowledge Base 

• Domain Independent Inference Mechanism 

• Theory is Operationalized with Chemical Periodic Table 


as a Test Domain 


KNOWLEDGE REPRESENTATION 


ELEMENTS OF EXPRESSIONS 


• Arguments 

• Descriptors 

• References 

• Terms 

• Facts - 

• Dependency - 

• Implication - 

• Hierarchies - 

• Similarity - 


veracity -/i, frequency- 0, confidence- 
forward- backward dependencies 
forward -backward implications 
generalization, specialization 
context, dominance, typicality 


ELEMENTS OF EXPRESSIONS 


Descriptor: d. 


breed 

temperature 

flies 


attribute 
attribute, function 
predicate 


Term s ; di(a x ), or d 2 (a u a 2 , . . .) 
breed(Fido) 

temper ature(latitude, altitude) 
temper ature(place) 

References: ri,{ri,...}. 

4 

true 
groupQ 


integer 

logical 

hierarchical 


Factual statements: di(ai) = r\ : [/z, 7^, 0, 7^] 


• p- veracity: Veracity indicates degree with which refer- 
ence r\ is applicable to descriptor-argument pair. 

• (j>- frequency: Frequency indicates proportion of argu- 
ment for which the reference is a valid description of the 
descriptor-argument pair. 

• 7/i» 1<}>- confidences in /z, 0. 


Examples: 

density (aluminum) = 2.7 : [1, .99, 1, 1] 

is-old(john) = yes : [.7, .9, 1, 1] 

engine jtype(car) = 4 _ cylinder : [1, 1, .8, .95] 

Dependency between terms: 


di(ai) * * d 2 (a 2 ) : [a, 7*, /3, 7^] 

• a,/3- forward and backward dependency strengths 
is -philosopher (X) < » is-greek(X) : [.5, .8, .0001, .8] 


Implications between factual statements: 


rfi(oi) = n <=$■ d 2 (a 2 ) = r 2 : [a,7 a ,/?,7/)] 

grain(jplace) = rice <=> 

rain(place) = [80. .120m] : [.9, .9, .5, .8] 

The implications can also be encoded by functions 

di(ai) = n d 2 (a 2 ) = /(n). 
radius (circle) = r <=> 


area(inscribed square) = 2 r 2 : [1,1, 1, 1] 


The transforms A GEN, A SPEC, A SIM, R GEN, R SPEC, 
R SIM allow traversal in a hierarchy, in the process of infer- 
ence. Simplified (no parameters) applications of the trans- 
forms are given below. 




• A GEN 


speed(computer_l) slow 

micro_computer = gen(computer_l): cx = alu_size 
alu_size(COMPUTER) < — > speed(COMPUTER) 
micro_computer = spec(COMPUTER) 
speed(micro_computer) = slow 



A GEN Transformation 











A SPEC 


height(basketbalLplayer) = tall 
karim = spec(basketbalLplayer) 
height(karim) = tall 


• A SIM 

economy(singapore) = strong 

hongkong = sim(singapore): cx = economic structure 
economy(hongkong) = strong 


R GEN 


reacts_with(potassium) = chlorine 
group7 = gen(chlorine) 
reacts_with(potassium) = group7 


R SPEC 


likes(mary) = carbonated-drinks 
coke = spec(carbonated-drink) 
likes(mary) = coke 


R SIM 


habitat(whales) = atlantic^ocean 
pacific_ocean = sim(atlantic-ocean) 
habitat(whales) = pacific.ocean 


• Combination of A SIM and R GEN 

reacts_with(potassium) = chlorine 
sodium = sim(potassium) A SIM 

reacts_with(sodium) = chlorine 
group7 = gen(chlorine) R GEN 






Theory of Plausible Reasoning and its 
Implementation. 


Collins and Michalski introduced a theory to model human 
plausible reasoning. APPLAUSE is an implementation of an 
extended and modified version of the theory. The method- 
ology is eminently suitable to reason in the domains where 
knowledge is organized hierarchically. The theory provides 
mechanisms to manipulate the knowledge base in case of in- 
complete and uncertain knowledge. Some features of the 
theory are highlighted with examples from chemical periodic 


table. 



QUERIES 


Form: 

descriptor [argument) = ref? /[\i, 7^, </>, 7^]? (1) 


Aim : 

In query 0 ) the system is to retrieve best reference value 
together with the estimated parameters. The best reference 
is one with highest fi * 7^ * <j> * 7^ product. 

• Type checking is performed for arguments and descrip- 
tors and references when applicable. 


ALGORITHM for processing Queries: 

• get_query(Q) 

• if ( get_fact(Q) successful) then - report retrieved in- 
formation, exit. 

• elseif reasoning_depth_counter > depthJimit then 

- combine whatever evidence available and exit. 

• else 


— increment depth.counter by one. 

— Dep set of dependencies/implications, such that 
descriptor occurs in RHS and a * > threshold T. 

— sort dependencies and implications according to de- 
creasing a * j a (gather strongest evidence first). 

— repeat until no more dependencies. 

* evaluate LHS of dependency or implication. If 
necessary call this routine to evaluate LHS. 

* apply suitable transforms such as A GEN, 
A SPEC, A SIM and compute RHS, decrement 
depth_counter by one and exit. 

• combine evidences: 

— choose best 7^, or /£*7 /n </>* products, for type 1 
or type 2 query respectively. 

— compute union and intersection of ranges to give up- 
per and lower bounds on the range of the conclusion 
respectively. 


EXAMPLE 


Given: 

o Group8a consists of gases [He, Ne, Ar, Kr, Xe, Rn]. 

• Boiling points of only 4 gases are known. 

[He/-269, Ne/-246, Ar/-185, Xe/-108]. 

Query: Find boiling point of Kr. 

Process: 


• Statistical analysis will be made to see if it is reasonable 
to aggregate the boiling points into a range and propa- 
gate it to a parent node. 

• Kr has two parents, group8a and period4. 

• Suitability of generalization is tested in both hierarchies. 


• The better one is selected for inference. 




Two possible hierarchies for periodic table 









Criteria for generalization: 


• Low standard deviation of residuals 

• Generalize from large number of points. 

• Low range of residual errors (fewer outliers) 

• Presence of functional dependencies with characteristics 
similar to those in the neighboring classes. 


• 'Causal connection' 


Criteria for generalization: 


• Low standard deviation of residuals 

• Generalize from large number of points. 

• Low range of residual errors (fewer outliers) 

• Presence of functional dependencies with characteristics 
similar to those in the neighboring classes. 


• 'Causal connection’ 


Total # of Points = 100 
Total # of Variables = 3 

Names of the attributes GROUP PERIOD ATNUM 
A suitable hierarchy is to be decided for attribute ATNUM 

Evaluating Group as the Primary Attribute for Classification 
Total Number of Distinct Classes = 18 












Evaluating Period as the Primary Attribute for Classification 
Total Number of Distinct Classes = 7 








Generalization group8a 

period4 

BP range 

[-108 .. -269] 

[58 .. 3450] 

slope m 

42 

-416 

std dev of 
residuals ay 

5% 

46% 

% intersection with 
neighboring class .r 

60% 

100% 

number of points n 

4 

17 

Computed a 

0.88 

0.8 

Computed y a 

0.93 

.3 


The equation (implication in a functional form) derived by 
best line fit method: 


BP = -317 -f 41.8*period (valid for group8a). 

The parameters a and y a are estimated by evaluating com- 
pliance to the criteria for generalizing. 


a = (1 — 0.2 * x) 

7 a = 0.5 * ( ” W ~ m ) + 0.4 * (1 - 0>) + 0.1 * 

max Homin' \7~l A ) 

Tabulated factors favor generalization in group8a rather than 
in period 4. 


















PARAMETERS FOR THE DERIVED CONCLUSION: 


Derivation using A SPEC without functional dependency 
BP(Kr) = [-108, -269] 

Parameters [/i, 7^, 0, 7^] are directly inherited from the parent 
node, however the precision of the answer is low. 

The answer is made more precise (narrower range) by using 
functional dependencies discovered in the related elements. 

Derivation using A SPEC together with functional dependency. 

Assume r and 7 r for Kr within group8a = .9, .95 
These can be estimated by evaluating common relevant fea- 
tures among the siblings. 

BP(Kr) = -317 -f 41.8*4 = -149.8 

Ac = A*i = 1 
7 He = * a * 7a * T * 7r 

= 1 * .9 * .9 * .9 * .95 = 0.6925 
(j) c (fi\ 1 

Idc — 1<1>\ * a * 7a * T * 7 r 

= 1 * .9 * .9* .9 * .95 = 0.6925 




Derivation using A SIM transform. 


Find elements similar to Kr in some context which affects 
boiling point. 

Suppose, relevant context is 

CX = (.7*group + .3*period) Rule 1 

and the dependency is given by, 

CX -> boiling point: a = .75, 7 a = 1; Rule 2 

• a and 7 Q estimated by global multiple regression analysis. 

• Localize the search space within the neighborhood of the 
argument in question 

• Similarity cr and 7 ^ are computed according to the for- 
mulas: 

a(Arg 1 ,Arg 2 ) = £ W t * a^attr^Arg^, attr t {Ar g 2 )) 

la(Argi, Arg 2 ) = £ W L * ^(attr^Arg^, attrj(Arg 2 )) 

where, the weights W l are normalized such that the sum 
of weights is 1 . 


Assuming pairwise similarity a and y a values 

cr(gr8a, gr7a) = .2; y a = .95 

cr( per4, per3) = .8; y a = .95 

a(per4, per5) = .7; y a = .95 

and given by context in Rule 1, we get 



EH 


cr(Kr, Elem) 

7 a(Kr f Elem) 

Ar 

8a 

3 

.7*1 + .3*. 8 = .94 

.95 

Xe 

8a 

5 

.7*1 + .3*. 7 = .91 

.95 

Cl 

7a 

3 

.7*. 2+ .3*. 8 = .38 

.95 

Br 

7a 

mm 

.7*. 2+ .3*1 = .44 

.95 

1 

7a 

5 

.7*. 2+ .3*. 7 = .35 

.95 


Disregard elements with a * y a < threshold T. 

























Similarity transform reference and parameter estimation: 


BP(Kr) = BP(Element) [/*, j /n </>, 7 ^] 

Ac BP(Ar) = -185 frz = 1, 7/1 = 1,0 = 1,70 = 1] 

BP(Kr) = BP(Ar) = -185.94 

/i c = H = 1 , 

7/i C = 7/i * * 7 <t * ol * 7a 

= 1 * .94 * .95 *.7* 1 = .625 

<t>c = ^ = 1 

70c = 70 * ^ * la * * 7q 

1 * .94 * .95 * .7* 1 = .625 

similarly, 

Xe: BP(Xe) = -108 [1, 1, 1. 1] 

BP(Kr) = BP(Xe) = -108.91 
M = 1 

7 ^ = 1 * .91 * .95 *.7* 1 = .605 

<t> c = 1 

7 ^ c = 1 * .91 * .95 *.7* 1 = .605 


COMBINATION OF EVIDENCES 


— Take the reference value of the result as the weighted 
average value of the BPs where the weights are de- 
cided by a *7 a product. 

— The parameters are taken as the weighted average of 
the evidences. 

BP(Kr)= [(BP(Ar)*cr(Kr,Ar) + BP(Xe)*cj(Kr,Xe) 

]/(£*<) 

BP(Kr) = ( — 185*.94+ — 108*.91)/(.94 + .91) = -147.1 
H c = Efii/N (if A n not large) 

(1 + l)/2 = 1 
7^ c = £ 7/<,- * cr *a*'f a /N 

= (1*.94*.75*1 + 1*.91*.75*1)/(1 + 1) = .6175 
<t> c = z 4>i/N 
(1 + l)/2 = 1 

7(S C = 77**"*“* 1a/N 

= (1 * .94 * .75 * 1 + 1 * .91 * .75 * 1) = .6175 



Comparison of parameters: 


Parameter 

A SPEC 

A SIM 

Actual 

BP 

-149.8 

-147 

-152 

f 1 

1 

1 

- 

7 m 

.6925 

.6175 

- 

<t> 

1 

1 

- 

l<t> 

.6925 

.6175 

- 


Choose results obtained by A SPEC since it yields infer- 
ence with higher confidence. 



















CONCLUSION 

Plausible Reasoning provides a useful mechanism to manip 
ulate available knowledge base to infer conclusions not arriv 
able by traditional logic. 
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UNDERSTANDING INFORMATION PROCESSING IN 
ANIMALS AS A WAY TO BUILDING INTELLIGENT ROBOTS 

Abstract 

One motive for the study of artificial neural systems is that knowledge 
derived from understanding the brain (and its billions of neurons) can be 
exploited and utilized in the building of machines. We hope to build 
machines with observational, adaptive, and manipulative powers resembling 
those of animals and humans. The machines could be used to reduce human 
exposure to conditions hostile to humans. Severe, dangerous, and unknown 
environments; long missions to remote locations; and critical, but tedious 
tasks represent such conditions. Fundamental understanding of nervous 
systems may be our only hope for building such machines, as no machine built 
on other principles has come close to achieving this goal. 
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gence (particularly expert systems technology). In 1981, Mr. Cruz founded a 
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applications, and implementation techniques. In particular, it led to the 
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KNOWLEDGE PROCESSING USING NEURAL NETWORKS 
Abstract 

Artificial neural networks (ANN's) are novel information processing 
mechanisms. Such networks are based on formal models of the function and 
organization of biological nerve nets. Neural nets possess several traits 
which make them an attractive substrate for artificial intelligence applica- 
tions. They are an inherently parallel processing mechanism promising great 
speed of operation. In addition, most artificial neural nets are adaptive 
in that the behavior of the network may change over time as a function of 
its "experience" (cumulative operating history). Traditional Al involves 
the emulation of very high-level behaviors and cognitive mechanisms. Neural 
nets are a comparatively simple, low-level mechanism. Thus, applying them 
to complex tasks requires the resolution of a great many issues. These 
include problems of knowledge representation, definition and implementation 
of inference mechanisms, and control of the inference process. This presen- 
tation will identify some of the particular issues to be addressed, as well 
as some possible approaches and early results in ANN-based knowledge 
process i ng . 
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Overview 


Project goal - explore flow-of-activation networks (FAN) 

as a new way to represent and process information: 

• Inspired by function and organization of biological 
nerve nets. 

• Use many simple processors to collectively 
perform operations (rather than one, or a few, 
processors operating independently). 

• System performs "pattern processing" operations, 
rather than "symoolic" or "numeric" processing. 

• FAN'S can be adaptive ; network behavior can 
change over time. 

• Simple processors act as "smart memory" — 
system's processors and memory are combined. 

• A FAN is a static network of "nodes" (processors) 
and "links" (communication paths). Acts like a 
"circuit" performing some function, rather than a 
sequence of procedure calls and data-structure 
instantiations. 
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GENERAL COMPUTATIONS 



DYNAMIC COMPUTATION 
NETWORKS 


STATIC COMPUTATION 
NETWORKS 


* PROCESSES AND DATA-FLOW 
PATHS ARE CREATED AND 
DESTROYED DURING RUN 


data-elow networks 


x PROCESSES AND DATA-FLOW 
PATHS ESTABLISHED BEFORE 
RUN, UNALTERED DURING RUN 


V 



PAN NETWORKS 


* PASS ARBITRARY STRUCTURED 
DATA OVER PATHS BETWEEN 
ARBITRARY PROCESSES 


x POINT-TO-POINT PATHS 
XMIT ONLY SCALAR DATA 
(NO DATA STRUCTURES) 


x PROCESSORS OPERATE ON 
SCALAR LOCAL DATA 
(e.g. ACTIVATION LEVELS) 

/ 

PAN NETWORKS 
(NEURAL NETS) 

x NODE PROCESSORS COMBINE 
LOCAL STATE INFORMATION 

* LINK PROCESSORS XMIT 
MODIFIED STATE INFORMATION 



x PARTICULAR NETWORK TOPOLOGY AND 
NODE/LINK PROCESSOR CHARACTERISTICS 


<NETTAX 2/87> 







Computing by flow-of-activation 


° A non-von Neumann computing mechanism 
® FAN attributes: 

♦ H ighly parallel (fast) 

♦ Simple, uniform (pattern) processing 
<> Can be adaptive • (’"learning" model) 

• Basic bahavior: 

♦ Static net represents events and responses 

♦ Events "activate" nodes 

♦ Nodes drive other nodes by "flow-of-activation" 

♦ Active nodes trigger actions 

♦ Result: each event and each action processed 
simultaneously in parallel 


ALMADEN 
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Some Attributes of Natural Intelligence 

• Purpose: to enable an organism to meet its needs 
(goals) in the face of a changing environment 

• Learning is key; "hard-wired", "programmed" behavior 
limits organism's adaptiveness (like "brittle" programs) 

• Nature of environment: 

4 Contains regular "features" (entities and events) 

4 Entities obey certain laws 

4 Both features and laws exhibit some variability 

• Intelligence capitalizes on these traits: 

4 Learn to recognize features, and to associate 
significance ("meaning") with them 

4 Predict attributes and behavior of features — build 
"world mode!" of environment 

4 Use "fuzzy processing" to make time-varying best 
guesses about current contents . of environment; 
seek adequate (not necessarily optimal) plan for 
meetina aoals 


KREV0287 


System Embedded in Environment 


© Cycle relating system and environment: 

1. Event occurs in environment (feature appears); 

2. System detects external event (internal event 
occurs); 

3. Internal event triggers system response; 

4. Response causes changes in environment 
(external events). 

• System needs internal representation of external 

reality: 

+ "Features" to represent external entities 

♦ "Relationships" to represent regularities between 
entities 

+ "Operations" to embody responses to external 
events 

+ "Rules" to associate specific responses with 
specific external or internal events 


KREV0287 






Actions Out 



Establishing Representations 


Purpose: set up equivalences (mappings) between 
external and internal entities: 

4 External entities have regularity (content) 

4 Presence of entity constitutes an event 
♦ Internal events produced by "detection" of external 
events (through fixed "transduction" mapping) 

Two basic schemes are possible: 

4 " Symbolic " representation: assign a "symbol" 

(referent) to "stand for" (point to a description of) 
represented entity. This scheme separates the 
definition (meaning) of an entity from its referent, 
coupling the two through an (arbitrarily assigned) 
association. No relationship (e.g. transduction) 
required between content of external entity and 
form of internal representation pointed to by 
symbol. 

4 "Semantic" representation: map the "transduction" 
of an external entity onto an internal "analog". 

This maps content of external event onto 
structure of internal representation. The mapping 
is fixed by transduction process (not arbitrary 
association). Mapping replaces use of referent 
(pointer). 
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Standard Al Techniques 


Based on "symbolic processing": 

• Processing uses descriptions (formal models) , 
rather than analogs — incomplete information 

® Processing based on algorithms (sets of explicit, 
detailed rules) — missing rules problematic, no 
generalization possible 

• Data content (meaning) not used in processing; 
formal manipulation only — only programmer has 
knowledge, or "interpretation", of data's "meaning" 

• Through above, processing uses only explicit 
relationships — cannot use implicit relationships, 
such as "similarity" between data 

• Through above, "learning" (i.e. changes in 
knowledge-base) requires mediation by centralized 
"performance assessor" — inherently a serial, 
high-level function 

• Processing is normally precise , as are symbols — 
no role for ambiguity, like that in fuzzy sets or 
approximate reasoning methods 


Control/decision-making normally centralized 
parallel processing difficult, therefore slow 


Intelligence via Artificial Neural Systems 


Based on "semantic processing": 

• Processing based on analogs of entities being 
reasoned about — acquired through learning or 
detailed formal specification, thus can contain 
complete information 

• Processing based on structure (contents) of 
knowledge-entities (KE's), plus connections 
(relationships) between them — inter-KE 
connections cause ,/ rule-like // behavior. 
Generalization possible through "approximate" 
satisfaction of rules 

• Data content (therefore structure) constrains 
inferences a KE can participate in — uniquely 
determines "interpretation" of KE's "meaning". No 
formal manipulation, just inference via 
"flow-of-activation" 

• No implicit relationships, just explicit ones (through 
inter-KE connections) — "similar" KEs have similar 
(largely-shared) structure 

• "Learning" occurs as local, low-level function 
(through change in inter-KE coupling) — no central 


"performance assessor" needed (though one may 
be usea; e.g. an attention mechanism) 

Both processing and KE's are normally imprecise 
(but can be made arbitrarily precise) — 
approximate reasoning is the norm 

Decisions made through locally-controlled 
flow-of-activation — processing is inherently parallel 
and fast (but can be made serial) 


What Are Knowledge Representation 
Networks (KRN)? 


An attempt to emulate some biological information 

processing capabilities (''intelligence"): 

4 Process multi-modal input (sensory processing) 

4 Produce flexible output activity (distributed motor 
control) 

4 Intelligent decision-making (cognitive functions; 
adaptive planning using a learned "world model") 

Assumption: choice of low-level implementation medium 

(FAN) is important: 

4 Must encompass above three types of functions 

4 Underlies important functional capabilities 
(associative storage, learning, distributed 
processing and memory, etc.) 

4 Draw insight from biological organizational 
principles 
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KRW ARCHITECTURE 


Basic knowledge units (features, operations and 
relationships) are represented by KRN networks 

Inferences are performed through interactions between 
knowledge units 

Underlying KRN has static structure (no additions or 
deletions of nodes or links during inferences) 

Channelled flow of activation creates dynamic 
knowledge structures as subsets of overall KRN (like 
instantiation of variables) 

PAN is ideal implementation medium for KRN 

(parallel network, flow of activation, learning model) 


KRN Architecture 


• Overall structure of KRN net — a "feature" hierarchy 
cross-coupled with an "operation" hierarchy (like 
advanced nervous systems): 

4 Feature hierarchy represents events and entities 
which the KRN net can recognize. Conjunctions 
of features define higher-level (more complex) 
features. 

4 Operation hierarchy represents operations (actions) 
which the net is capable of executing in response 
to detected features. Operations are decomposed 
into sequences of sub-operations. 

4 Cross-coupling from features to operations 
provides "rule-like" firing of operations as 
triggering features become active. 

4 Cross-coupling from operations to features 

activates "expectations" of state-changes which 
should result from execution of operations. This 
allows operations to execute in "closed-loop" 
mode. 
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Why Build KRN's from PAN's ? 


Need to represent events (features and operations): 

♦ Events are discrete and uniquely-identifiable 

4 Represent events by distinct PAN nodes 

♦ Detection of feature has a "certainty" measure 
(strength of belief); execution of operation has a 
"priority" measure (degree of appropriateness) 

♦ Encode in (a pattern of) continuous-valued PAN 
node activation levels 

Need to represent relationships between events: 

4 Events favor or inhibit other events with some 
pair-wise "degree of causality" (implication 
strength) 

4 Represent events by distinct (positive or negative) 
PAN links 
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Adaptiveness alters beliefs about degree of influence 

of one event on another: 

4 Modify pair-wise degree of causality 

4 Use FAN-link "learning" mechanism 

(e.g. Hebb model) 

Operations produce sequences of events: 

4 Arbitration between candidate operations requires 
comparison of candidates' "appropriateness" 
measure 

4 Encode in FAN node activation levels 

4 Temporal constraints important in executing some 
operations (use network dynamics) 

4 Event sequencing requires "handshaking" (use 
directed FAN flow of activation) 

Basic functions to implement behavior 

(stimulus-response cycle): 

4 Detect occurrence of events (like IF-clauses; use 
FAN "pattern-encoder" circuits) 
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^ Internal events represent responses 

(like THEN-clauses; use FAN "decoder circuits" to 
produce activity patterns) 

Events and operations may occur independently, and 

should be processed independently: 

+ FAN nodes and links are asynchronous 
independent processors 

♦ FAN processing is local (events and operations in 
FAN nodes, inter-node dependencies through FAN 
links) 
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Basic KRN "Building-Blocks" 


* Features are represented by FAN "encoder" circuits: 

♦ Output becomes active IF all required input-feature 
values are active (present) 

+ KRN net inputs can drive encoder input features. 

+ Match performed by encoder is "fuzzy" — there is 

some tolerance for input features within a 
satisfactory range, not just one exact value. 

+ Sets of encoders may take input from same set 
of input features. Inter-encoder competition is 
used to maximize activity of single encoder with 
best match to current inputs. 

+ FAN "column" circuit acts as encoder; FAN 
"hypercolumn" circuit acts as competing set of 
encoders with shared inputs. 

• "Decoders" produce a given output activity pattern 

across a set of output nodes: 

+ In producing specific network state-transitions, 
decoders act like production-rule THEN-clauses. 
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4 Decoder input activity level scales output activity 
vector. 

4 Decoder outputs can produce output from the 
overall KRN net. 

• Encoders cascaded into decoders form "fuzzy" 
state-machine: 

4 Encoders detect current network state; 

4 Active decoders trigger associated decoders 
(via FAN links); 

4 Active decoders cause state-transition (i.e. change 
activity of features which feed encoders). 

• Local FAN "attention mechanism" can be used to 
"enable" or "disable" sets of encoders and decoders. 
This helps focus processing. (Use and implementation 
of attention is crucial current research area). 
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FUZZY LOGIC OPERATORS AND NEURON ACTIVATION FIELDS 

Abstract 

A neural structure in light of fuzzy sets and operators is examined. During 
a study of underwater acoustic signatures, it was discovered that a simple 
version of the avalanche could be improved for classification purposes by 
adding two simulated hardware memories (or latches) to each neuron. The 
performance of the new structure, called a neuron ring, approximates the 
performance of cross correlation. Only simple operators, such as the sigma- 
count and the triangular norms MAX and MIN are necessary. In brief, the 
pattern exciting the neuron is viewed as simply a means to induce a possi- 
bility distribution in the neuron. The height of the distribution is 
partial support for the hypothesis in question. The summation of support 
after a time sequence of excitation is support for the hypothesis. 


Fuzzy Sets and Neural Networks 

W.R. Taber and R.O. Deich 

General Dynamics, Electronics Division, Box 85310 
San Diego, CA 92138, MZ 7202- K 


INTRODUCTION 

Fuzzy sets [15,17] and their operators 
have interesting applications in neural net- 
works. The performance of fuzzy decision 
rules for pattern classification provides a 
compelling reason to use graded set mem- 
bership functions. As we will show, fuzzy 
operators enable a simple structure, the 
neuron ring, to classify non-stationary pat- 
terns in the presence of severe noise. With- 
out fuzzy processing, the ring is exceptional- 
ly sensitive to noise. Its operation as a shift 
invariant filter is not appropriate. 

The underlying problem with non-fuzzy 
rings is that decision rules examine terminal 
activation instead of the historical activation 
record. 

In this paper, we report the performance 
of neural structures trained with undersea 
ship signatures from San Diego Bay and 
elsewhere. We compare performance to 
cross correlation - a conventional process- 
ing algorithm that seldom fails. Backpropa- 
gation [9,10,13] performance is also com- 
pared both with and without stationarity 
assumptions. 

Reference to ship signatures should not 
sidetrack the reader from recognizing the 
contributions of fuzzy processing to pattern 
recognition. Fuzzy processing may be the 
best model for non-stationary patterns - 
those patterns that change their descriptive 
statistics over time 1 . That ship acoustic 
energy survives to propagate over geo- 
graphic distances is amazing. Yet the digi- 


1 See reference 2 for definitions. 


tal computer, simulating fuzzy rings, correct- 
ly identifies ships in the presence of pseudo- 
white, colored, chirp, and other types of 
noise. This being the case, the algorithm 
outlined in this paper has intrinsic value 
apart from its underwater application. 

FUZZY SET EXAMPLE 

The indicator function of a standard set 
is either a 1 or a 0. Either the set element is 
present or it is not. In distinction, the indica- 
tor function of a fuzzy set admits graded 
membership. An element can be present or 
absent, or it may be present to a degree. A 
simple example will illustrate this concept. 

Suppose a man with a full head of hair 
is the subject of an experiment to quantify 
baldness. The experiment consists of pluck- 
ing a single hair then recording the answer 
to the question, "Is this man bald?" The 
first time, the answer is definitely "no". 
However, continued experiments with the 
same subject and other observers will lead 
to positive answers. The outcome of the 
experiment is justification for statements of 
the form "the probability that this man is 
bald is .50." 

Are there other ways to estimate bald- 
ness? The answer is yes, and fuzzy set the- 
ory provides an approach. 

A fuzzy practitioner views bald individ- 
uals as a set and tries to estimate an indi- 
vidual’s membership from data. For exam- 
ple, he estimates the number of hairs on the 
average head, then estimates the number of 
hairs on the subject’s head. Without solici- 
tation, he is able to use an estimate of the 


number of hairs as the numerator of the ratio 
of the subject’s hair to the average. This is 
an estimate of the approximate baldness 
degree. His membership in the set of bald 
individuals is about the ratio. 

Often, there is no information or even 
requirement to justify probability estimation 
either from a frequentist’s or from a Baye- 
sian’s viewpoint. It is in these situations 
that fuzzy sets offer complementary value. 

THE NEURON RING 

There are a number of connection inten- 
sive networks for pattern classification [7]. 
The Grossberg avalanche [5] cascades neu- 
ral elements to leam and recognize spa- 
tiotemporal patterns. In signal processing 
terminology, the avalanche recognizes non- 
stationary patterns. Hecht-Nielsen [6] fur- 
ther reduced the connectivity of this struc- 
ture in the commercial SPR (spatiotemporal 
pattern recognizer) feedforward network. 

The neuron ring resembles the SPR but 
has new architectural features. Its closest 
approximation is the torus of Goles [4]. 
The ring’s processing element is the DPNL- 
the dot product neuron with latches that 
hold time (T) and activation (M) values. 
These are visible in figure 1. 

The DPNL operates in the following 



manner. During training, the reference pat- 
tern is hardwired (fast-leam mode) into the 
neuron as vector Z. During recognition, the 
test pattern U is dotted with Z yielding a 
scalar. This quantity initializes an accumula- 
tor in DPNL i. The next operation multiplies 
the unit’s old activation, X old , by -a. Another 
term is computed with decouple gating the 
activation of the previous neuron. The acti- 
vation sum is multiplied by a gain factor A, 
yielding the activation increment 5X“ W . 

Decouple couples a small fraction of 
activation or encouragement forward. When 
zero, the signal enables a special mode for 
spectral classification using permutations of 
firing order as similarity metrics. When 
decoupled, the DPNLs activate indepen- 
dently, leaving an audit trail of firing order. 

Equation (1) relates these quantities 
for DPNL i with a variation in the terminolo- 
gy established by Hecht-Nielsen for the 
SPR. 

Figure 2 illustrates the tertiary struc- 
ture of a ring assembly. The lateral feedfor- 
ward and the single return are its gross fea- 
tures. The input layer is not shown. 

A pattern sequence excites the ring to a 
graded activation level. Rather than invoke 
the all or none firing principle, the DPNL 
exports its activation untouched except for 
hard limiting the value to the unit interval 
[0,1]. The basis for excitation is partial cor- 
relation by dot product. That is, correlation 
at zero time lag. For pattern vectors of unit 
length, the dot product is in the unit interval. 
The higher the number, the greater the simi- 
larity between Vi and V 2 on the unit hyper- 
sphere in 9i n . Each pattern excites every 
DPNL. Initially, both latches are reset to 0. 
As the patterns arrive at the DPNL, the reg- 
isters latch new values. The max latch, M, 
changes only when the current activation 
exceeds M. 

After complete pattern presentation, 
the activation field or max latch array is ana- 



lyzed with fuzzy primitives. It is this histor- 
ical record and processing that is absent in 
most network paradigms. 


Dp The ring with the highest acceptable 
activation in its last neuron wins. 


D 2 : The ring with the first activation of 1 
wins the competition. 

We eliminate D, immediately. Imagine 
a ring with 100 neurons. Suppose the test 
input is identical to the training pattern 
except that time slice vector 99 is an attenu- 
ated version of the true vector. DPNL 100 
will not be fully excited. Another ring, condi- 
tioned on a different signal, may win at the 
end of time slice 100 by a simple twist of 
fate at time 99. 

The other rule has competitive merit but 
it makes little sense when applied in a noisy 
environment with rampant phase errors. 
Transients may induce random neuron firing. 

Point failure is serious because a sys- 
tem which allows it to occur discounts his- 
torical evidence in favor of the current state 
as does a markov process. Other decision 
rules are possible. Before addressing die 
rule(s) of choice, we will discuss some 
aspects of sampling and phase error. 



Figure 2: A three pattern ring assembly. Pattern i conditions ring i. Each DPNL 

supports a microhypothesis (MH) for time j. 


X j new = Xj 0ld +A[-a X° ld + b ^ + c I 2 ] + 
Equation 1 

Xj new = Activation for neuron i 

Xj° Id = Old activation 

A = Attack factor 

a = Decay constant for old activation 

b = Gain for activation from previous 
neuron (decouple) 

II = Activation from previous neuron 
C = Gain for dot product 
Ig = Dot product of pattern Zj with 
unknown 11= 


Before discussing fuzzy activation field 
processing, we first examine two point-sen- 
sitive decision rules in common use before 
this study. They are: 




Phase error can be illustrated with an 
example from the undersea application. 
Assume the acoustic signature is a periodic 
and deterministic function of propeller angle 
as it spins. Suppose the training pattern 
was sampled when the propeller was verti- 
cal. The remaining samples follow at equal 
intervals. During a sea trial, the probability 
that sampling started with a vertical pro- 
peller is small. This being the case, the test 
and the training signal are similar except for 
a phase difference. 

The re-entrant ring compensates for 
phase, although the effect is large only for 
small rings. It makes no difference which 
DPNL is first stimulated; DPNLs activate 
around the ring by virtue of the syndetic 
lines. Phase displacement lies latent in the 
T latch chain. 

A better decision rule or heuristic for 
pattern classification will now be sought. 

Each DPNL continually provides a sta- 
tistic for testing the hypothesis the pattern 
is as expected as a correlation by-product. 
The first DPNL in a ring holds the pattern 
vector for time slice 1. Therefore, it esti- 
mates the grade of membership or suitabili- 
ty of the test pattern’s first vector. The sec- 
ond DPNL estimates the suitability of the 
second pattern vector. Anthropomorphically, 
operation is a question and answer se- 
quence: "How well, on a unit scale, do you 
like what you see at this time?" The prob- 
lem is to decide, that of all the activations 
generated by a DPNL during presentation, 
which best indicates support for the hypoth- 
esis? 

The answer is stated without proof; it is 
the height of the time series of activation for 
the DPNL, otherwise called the fuzzy possi- 
bility measure. It is just the contents of M. 

We estimate the compatibility of the 
test pattern with the ring as a whole. The 
appropriate operator is the sigma-count 
[16] of the fuzzy set M. Kosko [8] proved 
the sigma-count, EC, is a positive measure 


of set cardinality. It represents support for 
the ring’s hypothesis. 

Up to the present time, the discussion 
has been limited to a single ring. More sig- 
nals require more rings. The supervisory 
system, if it exists, recruits empty rings and 
conditions them as necessary based on 
mean squared error criteria. 

Assuming ECj is support for signal I, 
what rule robustly adjudicates the race for 
classification? 

This discussion argues that no point 
estimate from the ring during excitation will 
suffice as a fuzzy statistic, unless its value 
is unity. We propose the following calcula- 
tions as a foundation for a better decision 
rule. 

Support^ = E Mjj 
N on-support^ = Card - support- 

where support is belief in the hypothesis the 

pattern is i. and j is the DPNL index. Card 
is the cardinality of the ring’s non-fiizzy 
superset, i.e., the number of DPNLs per 
ring. Non-support is the degree to which the 
hypothesis is not warranted. 

A walk through figure 3 data will clarify 
the procedure for classification. Morphologi- 
cally, the assembly that generated the data 
had 10 rings. Each had 20 DPNLs, one per 
time slot. 

The numbers in the top matrix are the 
contents of the max latches at the end of the 
excitation. Rows index the pattern while 
columns index time. 

The contents of the max latch for DPNL 
for pattern 1, time 18, is 9 (the lack of a deci- 
mal is a concession to display technology). 
The 114 under EC is the support for pattern 
1 while its non-support is 200 - 114 = 86. 
With the implied decimal, these values are 
.9, 11.4, 20, and 8.6. 

The certainty ratio (CR) is calculated: 


CR = Supportj/L (Support-) 

Next, each CR is divided by the maxi- 
mum in the CR column. Finally, "FUZZY 
MEMBERSHIP" (FM) is reported as a per- 
centage. 

The quantities under "FUZZY MEM- 
BERSHIP" indicate the degree to which the 
training pattern is supported by the data if a 
decision must be made. Its associated 
degree of non-support is 100 - itself. For 
pattern 1, the values are 100 and 0. 


If a delayed decision is permitted, the 
EC column is more appropriate than FM. If 
the support for any ship is lower than a 
threshold, classification can be deferred until 
more samples are available. 

The formal definition of the possibility 
computations is as follows. Let the time slot 
set for a single neuron be A = {1, 2, 3, .... 
n). Let x j be the activation of a neuron at 

time i: 

x = {x , x 2 X3... x n ) 
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CLOSEST MATCH FOR INPUT SIGNAL: Boat 2 
CURRENT PERTURBATION PERCENT = 40 
TEST NUMBER = 25 

FIGURE 3. Computer screen depicting the excitation signal Boat 2 
with 40 % noise in its power spectrum. Numbers in the top matrix are 
the contents of the max latch at termination. The last column is the 
measure of support for the hypothesis implied by the row heading. 



Then the possibility distribution: 

n x = Xj/1 + Xj/2 + x Jn 

is induced on the neuron. The term Xj/1 
means: the possibility that the signal is 
appropriate given the signal at time 1 is x r 
Then the possibility measure 

7t(A) = max( Xj, x^ x^ ... x n ) 

is held in the max latch, M. 

Either the EC or the FM statistic is 
now considered a better decision statistic 
than those used by either Dj or D r We 
adopt a decision rule: 

D 3 =arc(LUC i ) 

i 

or 

D 4 = arc(U FMp 
i 

where the functor arc is a pointer back to the 
pattern name. Thus D 3 (l 14) is boat 2. 

One objective of pattern recognition is 
to generalize [1] , to go from a specific sig- 
nal to the class to which it belongs. An 
admissible algorithm is driven by the joint 
similarity between the training pattem(s) 
and the set of all patterns produced by the 
same signal source. It infers that while the 
test signal is different from any in the train- 
ing set, it has the gross properties of, say, a 
frigate. There are at least two variations on 
generalization. The first is mentioned only 
for the sake of completeness. 

Variation 1 examines the output string 
of the supervisory system if present. For 
example, the ring’s postprocessor declares 
the signal to be the frigate Mir on the basis 
of its emissions. Variation 1 parses the text 
string for the underlined word and subse- 


quently declares the class to be frigate . 

Variation 2 scrutinizes the output of the 

ring - the EC or FUZZY MEMBERSHIP 
tal data alone. Figure 3 illustrates that two 
of the fishing boats excite the ring but that 
the Zodiac raft does also. These boats have 
much in common in the frequency domain. 
On a broader note, the ring permits the test- 
ing of any fuzzy hypothesis that can be con- 
structed from the universe of discourse. 

Analysis of the activation history or 
field is facilitated with an element of set the- 
ory called the power set - the set of all sub- 
sets from the universe of discourse. Its car- 
dinality is 2°. 

Two constructs derivable from the pow- 
er set are the frame of discernment or dis- 
junctive frame (Strat [11]) and the frame of 
concernment or conjunctive frame. In all, 
they contain 2 (n+1) -(n+l) unique hypotheses 
and from them, any hypothesis with conjunc- 
tion/disjunction is constructive. For exam- 
ple, support for the hypothesis: 

H( Elizabeth) 

is 51 in the EC column. We can also test: 

H(the Elizabeth or boat 3) 

with MAX, the fuzzy set union operator. 

The arithmetic is a straightforward 
application of the sigma-count, and the 
Frank [3] triangular norms and co-norms 
MAX and MIN: 


H (Elizabeth) = 51 

H {Elizabeth or boat 3) = 
MAX(0, 51) = 51 

H(Elizabeth or boat 3) 


AND 

H(FF1041AorFF1041B) = 
MIN( 12,51)= 12 


Yager [14] and Zadeh [15,17] discuss other 
operators for reasoning with uncertain infor- 
mation. Similar exercises apply to FM. 


EXPERIMENTS AND NOISE 

Signatures from ten vessels were col- 
lected from San Diego Bay with an omnidi- 
rectional hydrophone. They joined an exten- 
sive library of marine mammal vocalizations, 
munition launches, seismic explosions, and 
other acoustic events. 

We soon developed a comprehensive 
procedure for simulating ship encounters 
and testing performance. Classification mer- 
it is equated to the probability of correct 
classification, PCC. Graphs of this function 
have PCC on the vertical axis and percent 
noise perturbation on the horizontal. The 
graph indicates the sensitivity level of the 
procedure under test to levels of increasing 
noise. Noise is added as percentage of sig- 
nal power from 0 to 100 percent. A noise 
level of 25% implies there is 25% uncertain- 
ty in the true value of any frequency bin. 

All signatures had ambient bay noise. 
They also contained aperiodic impulse 
spikes from nearby power rails. These were 
attenuated with a median filter (Taber [12]) 
before Fourier transformation. 

Uniformly distributed pseudo-white 
noise is not the only kind of noise in the 
ocean. As an aid to more comprehensive sit- 
uations, we used five noise types. 

• uniform white 

• ramp up with frequency 


• ramp down with frequency 

• time shift 

• convex combinations of signals 

Uniform noise occurs in narrow band 
samples. Our passband was 0-3.5 KHz. In 
some cases, the uniform noise assumption 
is justifiable. However, for example, the 
sudden appearance of a second boat in the 
water injects frequency and range depen- 
dent noise. 

Ramp up and ramp down are analogous 
to linear chirp in radar. The amount of noise 
is frequency dependent; the amplitude of the 
zero mean noise either increases or de- 
creases with frequency, simulating ocean 
anomalies and opening and closing ranges. 

Time shifts are expected. Seldom will 
the training signal be an exact replica of the 
test. An ocean buoy for monitoring harbor 
traffic must contend with tracking vessels 
over a range of perhaps hundreds of miles. 
In the laboratory, we simulate the buoy by 
taking the test signal from a different tape or 
tape segment than the training signal. 

Finally, convex combinations of existing 
signals test the resolving power of the net- 
work to identify fleets of ships. Can it gener- 
ate activation consistent with known signal 
mixes? For example, if we mix 75% destroy- 
er with 25% frigate, can the network confirm 
the 75/25 split? Convexity means that for 
any combination C, of signals, S k , 

C = aSj + PS 2 + tS 3 .... 
the coefficients sum to 1. 

RESULTS 

Our experiments tested the ring’s abili- 
ty to recognize signals in the presence of 
severe noise. Figures 4 and 5 illustrate the 


varied behavior of the algorithms in uniform the ring is less than a second if training time 

zero mean noise. Cross correlation almost is measured from the time a pattern is 

always recognizes the signal. Backpropaga- interned to the time the network is prepared 

tion (BP) trained with the average Fourier to recognize signals. 

power spectrum rather than the entire non- The non-fuzzy rules make the ring into 

stationary spectrum does not perform ade- a very narrow bandwidth matched filter for 

quately. Its failure justifies the use of the uniform noise. It does not matter whether 

non-stationary signals for training in spite Dj or D 2 is selected; the results are similar 

of the expensive training sweeps. The BP to the trace in figure 4b. 

network for figure 4 has 16 input neurons, 9 The ramp up and ramp down tests 

middle layer neurons, and 5 in the output caused a single fault in the PCC graph. All 

layer (16:9:5). The BP networks for figures plots for the fuzzy ring were constant at 

5 and 7 are 320:20:5. BP training time on a 100% PCC for noise levels that started at 
Sun 3/140 for ten non-stationary signals is 25% in the 0-60 Hz band and escalated to 

approximately 30 minutes. Training time for 75% in the 3.5 KHz band, and constant for 



0 20 40 60 80 100 0 20 40 60 80 100 

noise noise 


Figure 4. Probability of correct classification (PCC) as a function of additive noise percentage for 
back-propagation (BP), the neuron ring (NR) , cross correlation (CC), and the non-fuzzy ring 
structure using non-fuzzy rules Dj or D 2 . Non-fuzzy performance is approximate. BP trained on 
spatial data ; spatio-temporal patterns produced by averaging Fourier data records. Top and 
bottom graphs pertain to the frigate FF 1041A and to the Elizabeth, respectively. Each trace is 
based on 5000 simulated ship encounters or trials. 




the reverse situation for noise ramping The format of the convexity test was 

down from 75% to 25% with increasing fre- simply to mix the signals then excite the 
quency. Thus, the effective correct classifica- network with the mixture. Work is still in 

tion percentage is 99.99% for these tests. progress to detect if the mix percentage 
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Figure 5. Performance of backpropagation and the fuzzy ring in uniform additive zero mean 

percentage noise for the frigate and boat 2. Training is with spatio-temporal patterns. 
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Figure 6. PCC as a function of additive noise. Signals shifted in time by several minutes. 
Top diagrams are for the frigate FF1041A and the bottom diagrams are for boat 2. BP train- 
ing is spatial only. 
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Figure 7 

Figure 7. Probability of correct classification as a function of uniform additive noise. 
Time shift ~ 2 minutes, a) BP and NR for boat 2; b) cross correlation for boat 2. 
c) BP and NR for the frigate; and d) cross correlation for the frigate. Training is 
spatio-temporal. 


propagates to the decision metrics dis- 
cussed earlier. 

SUMMARY 

This study implies that the analysis of 
the historical activation record or activation 
field is more effective than using point esti- 
mates, at least for simple structures such as 
the neuron ring. We simulated over a quar- 
ter of a million ship encounters in the study; 
the graphs indicate typical performance. The 
breadth of the noise and signal characteris- 
tics lend credence to the thesis of this 
paper, namely, that excitation induces a pos- 
sibility distribution on the neuron’s activa- 
tion. The total support for the ring’s hypoth- 


esis is the sum of each neuron’s possibility 
measure. This support is a better indicator 
of pattern class than those used before this 
study. 
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WHAT CAN FUZZY LOGIC 


DO 


FOR NEURAL NETWORKS ? 
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This paper describes the results of a study to find minimal neural structures 
that are able to recognize ships from underwater recordings. 

We propose some architectural change to the neuron. 

Experiments were designed to test the ability of neural networks to 
correctly classify ships. During the tests, it became clear that the 
avalanche worked as an extremely narrow band matched filter. 
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We went back to the basics and asked: 


"What is the neuron doing?" 


"Can we boost performance?" 


♦ 


Performance 
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AT THE HEART OF MOST NEURAL NETWORKS 


DOT PRODUCT NEURON 
PERFORMS 

U • V = X 

u = prestored reference vector 
v = unknown or test vector 

x = nascent excitation scalar 


F(x) -> a e (0,1) 
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WHY THIS WORKS 


Suppose u, v are positive unit vectors on Hypersphere in R n 

U ‘ V -nnscb 

Hull • ||v|| 

But ||u|| • ||v|| = 1 (unit vectors) 

So cos $ = u • v e (0,1) 
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CORRELATION 


R uv (r A t) = — L x r u.. V . 
uv N - r i = i i i + r 


for time delay = 0: 


_ 1 n 

R uv = N i?, U i * V i 


if N is constant over all patterns. 


U • V = R 

uv 

Activation by correlation 
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BASIC OPERATION 


U* V = x 
Activation = F(x) 

Example: F= — 

1 + e 


What happens to information if F(x) * x? 

a. We make-up spurious information 

or 

b. We discount good information 
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If we add spurious information (spurium?) 
are we degrading performance ? 


If we discount good information, 
are we degrading performance ? 
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The problem domain is underwater acoustics 


We would like a system to tell us what ship or kind of ship is in the water 
We designed a series of experiments to find a system model 
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DATA CAPTURE 


• Record ships with a hydrophone 

• Filter the analog waveform at 8 KHz 

• Digitize .6 seconds at 20 KHz 

• 1024 pt Fourier transform on 20 time slots per ship 

• Compute power spectra 

• Train neuron rings for the pattern set 

TEST 

• Pick test signal 

• Add noise to test 

• Classify 
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NOISE TYPES 


• Pseudo-white 

• Ramp up with frequency 

• Ramp down with frequency 

• Time shift 

• Convex combinations of existing signals 
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X| new = X |°ld +A[-a Xj old + b ^ + c l 2 ] + 


Xj new = Activation for neuron i 

Xj old = Old activation 

A = Attack factor 

a = Decay constant for old activation 

b = Gain for activation from previous 
neuron (decouple) 

1*1 = Activation from previous neuron 

c = Gain for dot product 

12 = Dot product of pattern Zj with 

unknown Uj 
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100 ^ 
95 — \ 


Probability of correct classification vs. noise 


FRIGATE USS BRADLEY 


\WWW\W AVALANCHE NON-FUZZY PROCESSING 


5 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 


PERCENTAGE OF NOISE (50 TRIAL RUNS PER POINT) 
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Consider fuzzy sets and multi-valued logic. 

• Distribution free 

• Graded membership is an attractive idea 


Probabilitv 
P ( e or ~e ) = 1 

Possibility 

Po ( e or ~e ) = max ( e or ~e ) 

P ( .8 or .2 ) = 1 

Po ( .8 or .2 ) = .8 
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This enables fuzzy processing 



The Dot Product Neuron with Latches 


GENERAL DYNAMICS 
ELECTRONICS DIVISION 




LET’S ALLOW THE FOLLOWING F 



x if 0 < x < l\ 

1 if x > 1 J 


THEN WE MINIMIZE THE MODIFICATION OF INFORMATION 
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PROCEDURE 


• Gate each pattern to every neuron 

• Present all 20 vectors 

• Decide which pattern it is D 1 ... D 4 

D.,: The ring with the highest acceptable 
activation in its last neuron wins. 


Dp: The ring with the first activation of 1 
wins the competition. 


D 3 = arc( U ECj) 


D 4 = arc(UFM j ) 
i 
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+ BP 
. NR 



0 20 40 60 80 100 

noise 


0 20 40 . 60 80 100 

noise 



noise noise 


Figure 4. Probability of correct classification (PCC) as a function of additive noise percentage for 
back-propagation (BP), the neuron ring (NR) , cross correlation (CC), and the non-fuzzy ring 
structure using non-fuzzy rules D 1 or D 2 . Non-fuzzy performance is approximate. BP trained on 
spatial data only; spatio-temporal patterns reduced by averaging Fourier data records. 

Top frigate; bottom boat 2. 
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noise noise 


Figure 5. Performance of backpropagation and the fuzzy ring in uniform additive zero mean noise 
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Figure 7 

Figure 7. Probability of correct classification as a function of uniform additive noise. 
Time shift ~ 2 minutes, a) BP and NR for boat 2; b) cross correlation for boat 2. 
c) BP and NR for the frigate; and d) cross correlation for the frigate. Training is 
spatio-temporal. 
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FUZZY PROCESSING 


1. Patterns induce a Possibility distribution on the neuron 

2. Height of the distribution equates to possibility measure 

3. XC of the heights ► classification 
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DECISION RULES 


NON-FUZZY 



The ring with the highest acceptable 
activation in its last neuron wins. 

D 2 : The ring with the first activation of 1 
wins the competition. 


FUZZY 


j D 3 = arc(UZC j ) 
^D 4 = arc(UFM.) 


ZC = Sigma-count 

arc = pre-image of the argument 
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STATISTICS FOR TEST FILE: SHIPS 


INPUT SIGNAL: Boat 2 


WINDOW - 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

IC 

Boat 2 

0 

0 

2 

4 

6 

5 

5 

5 

4 

6 

6 

7 

7 

7 

6 

8 

7 

0 

10 

io fml 

Boat 3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

Elizabeth 

0 

0 

2 

4 

3 

3 

3 

3 

3 

2 

2 

2 

2 

2 

2 

2 

2 

4 

5 

5 

51 

SEINER 

0 

0 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

0 

0 

0 

0 

0 

14 

FF1041A 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

FF1041B 

0 

0 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

0 

0 

0 

0 

0 

0 

0 

12 

FFG41B 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

FFG41C 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

DREDGE 

0 

0 

0 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

2 

2 

1 

20 

ZODIAC 

0 

0 

2 

4 

5 

5 

5 

4 

4 

4 

4 

4 

3 

3 

3 

3 

3 

3 

4 

6 

69 


CERTAINTY MEASURES FOR INPUT SIGNAL: Boat 2 

TRAINING SIGNAL CERTAINTY RATIO FUZZY MEMBERSHIP 


Boat 2 

0.407 

1 1 100 | 

Boat 3 

0.000 

0 

Elizabeth 

0.182 

45 

SEINER 

0.050 

12 

FF1041A 

0.000 

0 

FF1041B 

0.043 

11 

FFG41B 

0.000 

0 

FFG41C 

0.000 

0 

DREDGE 

0.071 

18 

ZODIAC 

0.246 

61 

CLOSEST MATCH FOR INPUT SIGNAL: Boat 2 
CURRENT PERTURBATION PERCENT = 40 


TEST NUMBER - 25 
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SUMMARY 


We fuzzified the neuron 


F(x) = x if 0 < x <; 1 
1 if x > 1 
0 if x < 0 


graded membership 


M latches maximum F(x) 
T latches time 


possibility 


EC of M yields hypothesis support 


graded membership 


0 approaches 0 


near optimal performance 
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Center for Neural Science at Brown. In 1983, he joined Nestor, Inc., as its 
first employee and vice president of research and development. Dr. Reilly 
was instrumental in establishing Nestor's research and development office, 
and under his direction, the company has developed an adaptive pattern 
recognition technology based upon neural network principles - applying that 
system to problems in character recognition, speech recognition, object 
recognition, and risk assessment in financial services. 


ADAPTIVE PATTERN RECOGNITION USING 
A MULT I NEURAL NETWORK LEARNING SYSTEM 

Abstract 

A learning system composed of multiple neural networks and present examples 
of its application to problems in adaptive pattern recognition is discussed. 
The system makes use of multiple restricted Coulomb energy (RCE) networks 
that are powerful pattern classification subsystems, able to dynamically 
learn to separate non 1 i near 1 y-separabl e pattern classes in feature space, as 
well as to estimate class probabilities in nonseparable portions of the 
feature space. A controller integrates the responses of these various 
multiple neural networks to produce an overall system response. Addition- 
ally, the controller determines the training signals directed to the various 
component networks of the system to ensure that networks train to make the 
decisions for which they are best suited. Results of applying the system to 
problems in character recognition, industrial parts inspection, and decision 
support for risk analysis will be reviewed. 
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KNOWLEDGE REPRESENTATION BY LINGUISTIC 
TRANSITIVE CLOSURES OF TRAPEZOIDAL FUZZY NUMBERS 

Abstract 

We present a theory for the representation and manipulation of uncertainties 
that might be supplied by an expert (or team thereof) about object-pair 
relationships in some knowledge domain. We propose a theory based on the 
representation of relational knowledge by semantic term sets and trapezoidal 
fuzzy numbers. The extended max - (*) linguistic transitive closure (LTC) 
is offered as a means for consistency enforcement and completion of partial 
knowledge in the relational network. Theorems are given that provide 
conditions for the existence and uniqueness of the LTC under three 
(extended) T-norms. We present an algorithm for computing each LTC and 
exhibit a number of features of this method through numerical examples. 
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RELATION BETWEEN UNCERTAINTY 
REPRESENTATION IN DATA BASES AND RULE-BASED SYSTEMS 

Abstract 

Uncertainty in a rule (if A, then B) arises from its deductive validity, the 
preciseness of the antecedent A, and the proximity of A to the data to which 
it is matched. The latter two causes of uncertainty are both related to the 
data and its representation. Uncertainty in data represented in data bases 
takes the form of null values, range values, nonatomic values (e.g., 
embedded relations), and various representations based on fuzzy set theory. 
There are similarities between the instantiation of the terms in a query and 
the action of matching rule antecedents. The latter is further complicated 
by the unification process, which, in some ways, resembles evaluation of 
transitive queries. These and other correspondences (and differences) are 
examined. 
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LINEAR FUZZY CONTROLLER 
Abstract 

We consider a process controlled by a controller described by an n-th order 
linear ordinary differential equation toward its target output. As a 
special case, the controller is a proport iona 1 - integra 1 -der i vat i ve (PID) 
controller. We show how to construct a linear fuzzy controller that gives 
precisely the same control as the PID controller. It is speculated that 
nonfuzzy controllers and fuzzy controllers may coincide on an unsuspectingly 
large class of control problems. 
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FUZZY CONTROL THEORY: A NONLINEAR CASE 
Abstract 

We prove theoretically that a nonlinear fuzzy controller is a nonfuzzy 
proport iona 1 -i ntegral -der i vat i ve (PID) controller with proportional gain, 
integral constant, and derivative constant changing with error, rate change 
of error, and rate change of error rate about a setpoint of a process. The 
nonlinear fuzzy controller consists of the following parts: 

1. The linear defuzzification algorithm 

2. The linear fuzzy control rules 

3. Zadeh's AND and OR fuzzy logics for evaluating the fuzzy control 

rules 

4. The nonlinear defuzzification algorithm 

The nonlinear fuzzy controller is a linear fuzzy controller which is pre- 
cisely equivalent to a nonfuzzy PID controller if the linear defuzzification 
algorithm is used instead of the nonlinear one listed above. 

The results of computer simulation reveal that the control performances of 
the nonlinear fuzzy controller and a nonfuzzy PID controller are almost the 
same if a linear process is controlled. However, the nonlinear fuzzy 
controller can control some nonlinear processes much better than a nonfuzzy 
PID controller does. 


